Sanger Institute - Publications 2010

Number of papers published in 2010: 214

  • A map of human genome variation from population-scale sequencing.

    1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME and McVean GA

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Medical Research Council: G0801823, G0801823(89305), G1000758B; NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01 HG004120, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54 HG003273, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861, T32 GM007753; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z

    Nature 2010;467;7319;1061-73

  • Genetic evidence of multiple loci in dystocia--difficult labour.

    Algovik M, Kivinen K, Peterson H, Westgren M and Kere J

    Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden. michael.algovik@ltkalmar.se

    Background: Dystocia, difficult labour, is a common but also complex problem during childbirth. It can be attributed to either weak contractions of the uterus, a large infant, reduced capacity of the pelvis or combinations of these. Previous studies have indicated that there is a genetic component in the susceptibility of experiencing dystocia. The purpose of this study was to identify susceptibility genes in dystocia.

    Methods: A total of 104 women in 47 families were included where at least two sisters had undergone caesarean section at a gestational length of 286 days or more at their first delivery. Study of medical records and a telephone interview was performed to identify subjects with dystocia. Whole-genome scanning using Affymetrix genotyping-arrays and non-parametric linkage (NPL) analysis was made in 39 women exhibiting the phenotype of dystocia from 19 families. In 68 women re-sequencing was performed of candidate genes showing suggestive linkage: oxytocin (OXT) on chromosome 20 and oxytocin-receptor (OXTR) on chromosome 3.

    Results: We found a trend towards linkage with suggestive NPL-score (3.15) on chromosome 12p12. Suggestive linkage peaks were observed on chromosomes 3, 4, 6, 10, 20. Re-sequencing of OXT and OXTR did not reveal any causal variants.

    Conclusions: Dystocia is likely to have a genetic component with variations in multiple genes affecting the patient outcome. We found 6 loci that could be re-evaluated in larger patient cohorts.

    BMC medical genetics 2010;11;105

  • Data quality control in genetic case-control association studies

    ANDERSON CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT

    Nature Protocols. 2010;5;1564-73

  • Genome-wide association study of migraine implicates a common susceptibility variant on 8q22.1.

    Anttila V, Stefansson H, Kallela M, Todt U, Terwindt GM, Calafato MS, Nyholt DR, Dimas AS, Freilinger T, Müller-Myhsok B, Artto V, Inouye M, Alakurtti K, Kaunisto MA, Hämäläinen E, de Vries B, Stam AH, Weller CM, Heinze A, Heinze-Kuhn K, Goebel I, Borck G, Göbel H, Steinberg S, Wolf C, Björnsson A, Gudmundsson G, Kirchmann M, Hauge A, Werge T, Schoenen J, Eriksson JG, Hagen K, Stovner L, Wichmann HE, Meitinger T, Alexander M, Moebus S, Schreiber S, Aulchenko YS, Breteler MM, Uitterlinden AG, Hofman A, van Duijn CM, Tikka-Kleemola P, Vepsäläinen S, Lucae S, Tozzi F, Muglia P, Barrett J, Kaprio J, Färkkilä M, Peltonen L, Stefansson K, Zwart JA, Ferrari MD, Olesen J, Daly M, Wessman M, van den Maagdenberg AM, Dichgans M, Kubisch C, Dermitzakis ET, Frants RR, Palotie A and International Headache Genetics Consortium

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. verneri.anttila@sanger.ac.uk

    Migraine is a common episodic neurological disorder, typically presenting with recurrent attacks of severe headache and autonomic dysfunction. Apart from rare monogenic subtypes, no genetic or molecular markers for migraine have been convincingly established. We identified the minor allele of rs1835740 on chromosome 8q22.1 to be associated with migraine (P = 5.38 × 10⁻⁹, odds ratio = 1.23, 95% CI 1.150-1.324) in a genome-wide association study of 2,731 migraine cases ascertained from three European headache clinics and 10,747 population-matched controls. The association was replicated in 3,202 cases and 40,062 controls for an overall meta-analysis P value of 1.69 × 10⁻¹¹ (odds ratio = 1.18, 95% CI 1.127-1.244). rs1835740 is located between MTDH (astrocyte elevated gene 1, also known as AEG-1) and PGCP (encoding plasma glutamate carboxypeptidase). In an expression quantitative trait study in lymphoblastoid cell lines, transcript levels of the MTDH were found to have a significant correlation to rs1835740 (P = 3.96 × 10⁻⁵, permuted threshold for genome-wide significance 7.7 × 10⁻⁵. To our knowledge, our data establish rs1835740 as the first genetic risk factor for migraine.

    Funded by: Wellcome Trust: 089062, WT089062

    Nature genetics 2010;42;10;869-73

  • Rare variant association analysis methods for complex traits.

    Asimit J and Zeggini E

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.

    There has been increasing interest in rare variants and their association with disease, and several rare variant-disease associations have already been detected. The usual association tests for common variants are underpowered for detecting variants of lower frequency, so alternative approaches are required. In addition to reviewing the association analysis methods for rare variants, we discuss the limitations of genome-wide association studies in identifying rare variants and the problems that arise in the imputation of rare variants.

    Funded by: Wellcome Trust: WT088885/Z/09/Z

    Annual review of genetics 2010;44;293-308

  • A predominantly neolithic origin for European paternal lineages.

    Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C and Jobling MA

    Department of Genetics, University of Leicester, Leicester, United Kingdom.

    The relative contributions to modern European populations of Paleolithic hunter-gatherers and Neolithic farmers from the Near East have been intensely debated. Haplogroup R1b1b2 (R-M269) is the commonest European Y-chromosomal lineage, increasing in frequency from east to west, and carried by 110 million European men. Previous studies suggested a Paleolithic origin, but here we show that the geographical distribution of its microsatellite diversity is best explained by spread from a single source in the Near East via Anatolia during the Neolithic. Taken with evidence on the origins of other haplogroups, this indicates that most European Y chromosomes originate in the Neolithic expansion. This reinterpretation makes Europe a prime example of how technological and cultural change is linked with the expansion of a Y-chromosomal lineage, and the contrast of this pattern with that shown by maternally inherited mitochondrial DNA suggests a unique role for males in the transition.

    Funded by: Wellcome Trust: 057559, 065569, 084060, 087576

    PLoS biology 2010;8;1;e1000285

  • Curators of the world unite: the International Society of Biocuration.

    Bateman A

    Bioinformatics (Oxford, England) 2010;26;8;991

  • DUFs: families in search of function.

    Bateman A, Coggill P and Finn RD

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England. agb@sanger.ac.uk

    Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.

    Funded by: Wellcome Trust: 087656, WT077044/Z/05/Z

    Acta crystallographica. Section F, Structural biology and crystallization communications 2010;66;Pt 10;1148-52

  • Signatures of adaptation to obligate biotrophy in the Hyaloperonospora arabidopsidis genome.

    Baxter L, Tripathy S, Ishaque N, Boot N, Cabral A, Kemen E, Thines M, Ah-Fong A, Anderson R, Badejoko W, Bittner-Eddy P, Boore JL, Chibucos MC, Coates M, Dehal P, Delehaunty K, Dong S, Downton P, Dumas B, Fabro G, Fronick C, Fuerstenberg SI, Fulton L, Gaulin E, Govers F, Hughes L, Humphray S, Jiang RH, Judelson H, Kamoun S, Kyung K, Meijer H, Minx P, Morris P, Nelson J, Phuntumart V, Qutob D, Rehmany A, Rougon-Cardoso A, Ryden P, Torto-Alalibo T, Studholme D, Wang Y, Win J, Wood J, Clifton SW, Rogers J, Van den Ackerveken G, Jones JD, McDowell JM, Beynon J and Tyler BM

    School of Life Sciences, Warwick University, Wellesbourne, CV35 9EF, UK.

    Many oomycete and fungal plant pathogens are obligate biotrophs, which extract nutrients only from living plant tissue and cannot grow apart from their hosts. Although these pathogens cause substantial crop losses, little is known about the molecular basis or evolution of obligate biotrophy. Here, we report the genome sequence of the oomycete Hyaloperonospora arabidopsidis (Hpa), an obligate biotroph and natural pathogen of Arabidopsis thaliana. In comparison with genomes of related, hemibiotrophic Phytophthora species, the Hpa genome exhibits dramatic reductions in genes encoding (i) RXLR effectors and other secreted pathogenicity proteins, (ii) enzymes for assimilation of inorganic nitrogen and sulfur, and (iii) proteins associated with zoospore formation and motility. These attributes comprise a genomic signature of evolution toward obligate biotrophy.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C509123/1, BB/E024815/1, BB/E024882/1, BB/F0161901, EP/F500025/1, T12144; Wellcome Trust

    Science (New York, N.Y.) 2010;330;6010;1549-51

  • Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06.

    Bennett JS, Bentley SD, Vernikos GS, Quail MA, Cherevach I, White B, Parkhill J and Maiden MC

    Department of Zoology, University of Oxford, UK. julia.bennett@zoo.ox.ac.uk

    Background: The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally.

    Results: Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent.

    Conclusion: The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed.

    Funded by: Wellcome Trust: 087622

    BMC genomics 2010;11;652

  • Variants in ACAD10 are associated with type 2 diabetes, insulin resistance and lipid oxidation in Pima Indians.

    Bian L, Hanson RL, Muller YL, Ma L, MAGIC Investigators, Kobes S, Knowler WC, Bogardus C and Baier LJ

    Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 445 N. 5th Street, Suite 210, Phoenix, AZ 85004, USA.

    Aims/hypothesis: A prior genome-wide association study in Pima Indians identified a variant within the ACAD10 gene that is associated with early-onset type 2 diabetes. Acylcoenzyme A dehydrogenase 10 (ACAD10) catalyses mitochondrial fatty acid beta-oxidation, which plays a pivotal role in developing insulin resistance and type 2 diabetes. Therefore, ACAD10 was analysed as a positional and biological candidate for type 2 diabetes.

    Methods: Twenty-three SNPs were genotyped in 1,500 Pima Indians to determine the linkage disequilibrium pattern across ACAD10. Association with type 2 diabetes was determined by genotyping four tag single nucleotide polymorphisms (SNPs) in a population-based sample of 3,501 full-heritage Pima Indians; two associated SNPs were further genotyped in a second population-based sample of 3,723 American Indians. Associations with quantitative traits were assessed in 415 non-diabetic full heritage Pima individuals who had been metabolically phenotyped.

    Results: SNPs rs601663 and rs659964 were associated with type 2 diabetes in the full-heritage Pima Indian sample (p=0.04 and 0.0006, respectively), and rs659964 was further associated with type 2 diabetes in the second American Indian sample (p=0.04). Combination of these two samples provided the strongest evidence for association (p=0.009 and 0.00007, for rs601663 and rs659964, respectively). Quantitative trait analyses identified nominal associations with both lower lipid oxidation rate and larger subcutaneous abdominal adipocyte size, which is consistent with the known physiology of ACAD10, and also identified associations with increased insulin resistance.

    Conclusions/interpretation: We propose that ACAD10 variation may increase type 2 diabetes susceptibility by impairing insulin sensitivity via abnormal lipid oxidation.

    Funded by: NIDDK NIH HHS: ZIA DK075012-04

    Diabetologia 2010;53;7;1349-53

  • Signatures of mutation and selection in the cancer genome.

    Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The cancer genome is moulded by the dual processes of somatic mutation and selection. Homozygous deletions in cancer genomes occur over recessive cancer genes, where they can confer selective growth advantage, and over fragile sites, where they are thought to reflect an increased local rate of DNA breakage. However, most homozygous deletions in cancer genomes are unexplained. Here we identified 2,428 somatic homozygous deletions in 746 cancer cell lines. These overlie 11% of protein-coding genes that, therefore, are not mandatory for survival of human cells. We derived structural signatures that distinguish between homozygous deletions over recessive cancer genes and fragile sites. Application to clusters of unexplained homozygous deletions suggests that many are in regions of inherent fragility, whereas a small subset overlies recessive cancer genes. The results illustrate how structural signatures can be used to distinguish between the influences of mutation and selection in cancer genomes. The extensive copy number, genotyping, sequence and expression data available for this large series of publicly available cancer cell lines renders them informative reagents for future studies of cancer biology and drug discovery.

    Funded by: NCI NIH HHS: P01 CA155258; Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7283;893-8

  • Large, rare chromosomal deletions associated with severe early-onset obesity.

    Bochukova EG, Huang N, Keogh J, Henning E, Purmann C, Blaszczyk K, Saeed S, Hamilton-Shield J, Clayton-Smith J, O'Rahilly S, Hurles ME and Farooqi IS

    University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.

    Obesity is a highly heritable and genetically heterogeneous disorder. Here we investigated the contribution of copy number variation to obesity in 300 Caucasian patients with severe early-onset obesity, 143 of whom also had developmental delay. Large (>500 kilobases), rare (<1%) deletions were significantly enriched in patients compared to 7,366 controls (P < 0.001). We identified several rare copy number variants that were recurrent in patients but absent or at much lower prevalence in controls. We identified five patients with overlapping deletions on chromosome 16p11.2 that were found in 2 out of 7,366 controls (P < 5 x 10(-5)). In three patients the deletion co-segregated with severe obesity. Two patients harboured a larger de novo 16p11.2 deletion, extending through a 593-kilobase region previously associated with autism and mental retardation; both of these patients had mild developmental delay in addition to severe obesity. In an independent sample of 1,062 patients with severe obesity alone, the smaller 16p11.2 deletion was found in an additional two patients. All 16p11.2 deletions encompass several genes but include SH2B1, which is known to be involved in leptin and insulin signalling. Deletion carriers exhibited hyperphagia and severe insulin resistance disproportionate for the degree of obesity. We show that copy number variation contributes significantly to the genetic architecture of human obesity.

    Funded by: Medical Research Council: G0900554; Wellcome Trust: 077014, 077014/Z/05/0Z, 082390, 082390/Z/07/Z), 085475

    Nature 2010;463;7281;666-70

  • Variants at DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci are associated with reduced glucose-stimulated beta cell function in middle-aged Danish people.

    Boesgaard TW, Grarup N, Jørgensen T, Borch-Johnsen K, Meta-Analysis of Glucose and Insulin-Related Trait Consortium (MAGIC), Hansen T and Pedersen O

    Hagedorn Research Institute, Niels Steensens Vej 2, 2820 Gentofte, Denmark.

    Aims/hypothesis: A meta-analysis of 21 genome-wide association studies identified 11 novel genetic loci implicated in fasting glucose homeostasis. We aimed to evaluate the impact of these variants on insulin release and insulin sensitivity estimated from OGTTs.

    Methods: Eleven variants in or near DGKB/TMEM195, ADCY5, MADD, ADRA2A, FADS1, CRY2, SLC2A2, GLIS3, PROX1, C2CD4B and IGF1 were genotyped in 6,784 middle-aged participants of the population-based Inter99 cohort. Association studies of quantitative estimates of insulin release and insulin sensitivity were performed in 5,722 non-diabetic Danish participants on whom an OGTT was performed.

    Results: Assuming an additive genetic model, carriers of the alleles increasing fasting glucose in DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B showed decreased glucose-stimulated insulin release as assessed by the BIGTT-acute insulin response index (2.7-3.5%; p < 0.005 for all) and by corrected insulin response (2.8-5.9%; p < 0.03 for all). In addition, the PROX1 glucose-raising allele showed a 2.9% decreased corrected insulin response (p = 0.03), while the hyperglycaemic allele of variants in or near ADRA2A, FADS1, CRY2 and C2CD4B were associated with a 2.6% to 9.3% decrease in one or both of two different OGTT-based disposition indices (p < 0.02 for all). After correction for multiple testing, variants in the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with estimates of beta cell function.

    Conclusions/interpretation: We found that the lead variants at the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with decreased glucose-stimulated insulin response. This association underlines the importance of pancreatic beta cell dysfunction in the genetic predisposition to hyperglycaemia and type 2 diabetes.

    Diabetologia 2010;53;8;1647-55

  • Large-scale association analysis of TNF/LTA gene region polymorphisms in type 2 diabetes.

    Boraska V, Rayner NW, Groves CJ, Frayling TM, Diakite M, Rockett KA, Kwiatkowski DP, Day-Williams AG, McCarthy MI and Zeggini E

    Department of Medical Biology, University of Split School of Medicine, Split, Croatia. vboraska@mefst.hr

    Background: The TNF/LTA locus has been a long-standing T2D candidate gene. Several studies have examined association of TNF/LTA SNPs with T2D but the majority have been small-scale and produced no convincing evidence of association. The purpose of this study is to examine T2D association of tag SNPs in the TNF/LTA region capturing the majority of common variation in a large-scale sample set of UK/Irish origin.

    Methods: This study comprised a case-control (1520 cases and 2570 control samples) and a family-based component (423 parent-offspring trios). Eleven tag SNPs (rs928815, rs909253, rs746868, rs1041981 (T60N), rs1800750, rs1800629 (G-308A), rs361525 (G-238A), rs3093662, rs3093664, rs3093665, and rs3093668) were selected across the TNF/LTA locus and genotyped using a fluorescence-based competitive allele specific assay. Quality control of the obtained genotypes was performed prior to single- and multi-point association analyses under the additive model.

    Results: We did not find any consistent SNP associations with T2D in the case-control or family-based datasets.

    Conclusions: The present study, designed to analyse a set of tag SNPs specifically selected to capture the majority of common variation in the TNF/LTA gene region, found no robust evidence for association with T2D. To investigate the presence of smaller effects of TNF/LTA gene variation with T2D, a large-scale meta-analysis will be required.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113, WT088885/Z/09/Z

    BMC medical genetics 2010;11;69

  • 53BP1 loss rescues BRCA1 deficiency and is associated with triple-negative and BRCA-mutated breast cancers.

    Bouwman P, Aly A, Escandell JM, Pieterse M, Bartkova J, van der Gulden H, Hiddingh S, Thanasoula M, Kulkarni A, Yang Q, Haffty BG, Tommiska J, Blomqvist C, Drapkin R, Adams DJ, Nevanlinna H, Bartek J, Tarsounas M, Ganesan S and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Germ-line mutations in breast cancer 1, early onset (BRCA1) result in predisposition to breast and ovarian cancer. BRCA1-mutated tumors show genomic instability, mainly as a consequence of impaired recombinatorial DNA repair. Here we identify p53-binding protein 1 (53BP1) as an essential factor for sustaining the growth arrest induced by Brca1 deletion. Depletion of 53BP1 abrogates the ATM-dependent checkpoint response and G2 cell-cycle arrest triggered by the accumulation of DNA breaks in Brca1-deleted cells. This effect of 53BP1 is specific to BRCA1 function, as 53BP1 depletion did not alleviate proliferation arrest or checkpoint responses in Brca2-deleted cells. Notably, loss of 53BP1 partially restores the homologous-recombination defect of Brca1-deleted cells and reverts their hypersensitivity to DNA-damaging agents. We find reduced 53BP1 expression in subsets of sporadic triple-negative and BRCA-associated breast cancers, indicating the potential clinical implications of our findings.

    Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356

    Nature structural & molecular biology 2010;17;6;688-95

  • Rare variation at the TNFAIP3 locus and susceptibility to rheumatoid arthritis.

    Bowes J, Lawrence R, Eyre S, Panoutsopoulou K, Orozco G, Elliott KS, Ke X, Morris AP, UKRAG, Thomson W, Worthington J, Barton A and Zeggini E

    Arthritis Research UK, Epidemiology Unit, University of Manchester, Manchester, UK.

    Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤ 0.05) into a single count coupled with univariate analysis. We then prioritized gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10(-6)). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.

    Funded by: Arthritis Research UK: 17552, 18475; Wellcome Trust: 064890, 081682

    Human genetics 2010;128;6;627-33

  • Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

    Brister JR, Bao Y, Kuiken C, Lefkowitz EJ, Le Mercier P, Leplae R, Madupu R, Scheuermann RH, Schobel S, Seto D, Shrivastava S, STERK P, Zeng Q, Klimke W, Tatusova T

    Viruses-Basel. 2010;2;2258-68

  • Scoring and validation of tandem MS peptide identification methods.

    Brosch M and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.

    Methods in molecular biology (Clifton, N.J.) 2010;604;43-53

  • Quantifying the mechanisms of domain gain in animal proteins.

    Buljan M, Frankish A and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. mb613@cam.ac.uk

    Background: Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms.

    Results: Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events.

    Conclusions: The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.

    Genome biology 2010;11;7;R74

  • The patterns and dynamics of genomic instability in metastatic pancreatic cancer.

    Campbell PJ, Yachida S, Mudie LJ, Stephens PJ, Pleasance ED, Stebbings LA, Morsberger LA, Latimer C, McLaren S, Lin ML, McBride DJ, Varela I, Nik-Zainal SA, Leroy C, Jia M, Menzies A, Butler AP, Teague JW, Griffin CA, Burton J, Swerdlow H, Quail MA, Stratton MR, Iacobuzio-Donahue C and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97-98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites, and how the tumour disseminates. Here we harness advances in DNA sequencing to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2-M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.

    Funded by: NCI NIH HHS: CA106610, CA140599, K08 CA106610, K08 CA106610-03, K08 CA106610-04, K08 CA106610-05, R01 CA140599, R01 CA140599-01, R01 CA140599-02, R01 CA140599-03; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA

    Nature 2010;467;7319;1109-13

  • Beyond the Genome: genomics research ten years after the human genome sequence.

    Casto AM and Amid C

    Department of Genetics, Stanford University, Stanford, CA 94305, USA. morgan21@stanford.edu

    A report on the meeting 'Beyond the Genome', Boston, USA, 11-13 October 2010.

    Genome biology 2010;11;11;309

  • Molecular and physiological analysis of three Pseudomonas aeruginosa phages belonging to the "N4-like viruses".

    Ceyssens PJ, Brabban A, Rogge L, Lewis MS, Pickard D, Goulding D, Dougan G, Noben JP, Kropinski A, Kutter E and Lavigne R

    Division of Gene Technology, Katholieke Universiteit Leuven, Kasteelpark Arenberg, Leuven, B-3001, Belgium.

    We present a detailed analysis of the genome architecture, structural proteome and infection-related properties of three Pseudomonas phages, designated LUZ7, LIT1 and PEV2. These podoviruses encapsulate 72.5 to 74.9 kb genomes and lyse their host after 25 min aerobic infection. PEV2 can successfully infect under anaerobic conditions, but its latent period is tripled, the lysis proceeds far slower and the burst size decreases significantly. While the overall genome structure of these phages resembles the well-studied coliphage N4, these Pseudomonas phages encode a cluster of tail genes which displays significant similarity to a Pseudomonasaeruginosa (cryptic) prophage region. Using ESI-MS/MS, these tail proteins were shown to be part of the phage particle, as well as ten other proteins including a giant 370 kDa virion RNA polymerase. These phages are the first described representatives of a novel kind of obligatory lytic P. aeruginosa-infecting phages, belonging to the widespread "N4-like viruses" genus.

    Funded by: Wellcome Trust

    Virology 2010;405;1;26-30

  • Genetic loci influencing kidney function and chronic kidney disease.

    Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA, Sehmi JS, Gale DP, Wass MN, Ahmadi KR, Bakker SJ, Beckmann J, Bilo HJ, Bochud M, Brown MJ, Caulfield MJ, Connell JM, Cook HT, Cotlarciuc I, Davey Smith G, de Silva R, Deng G, Devuyst O, Dikkeschei LD, Dimkovic N, Dockrell M, Dominiczak A, Ebrahim S, Eggermann T, Farrall M, Ferrucci L, Floege J, Forouhi NG, Gansevoort RT, Han X, Hedblad B, Homan van der Heide JJ, Hepkema BG, Hernandez-Fuentes M, Hypponen E, Johnson T, de Jong PE, Kleefstra N, Lagou V, Lapsley M, Li Y, Loos RJ, Luan J, Luttropp K, Maréchal C, Melander O, Munroe PB, Nordfors L, Parsa A, Peltonen L, Penninx BW, Perucha E, Pouta A, Prokopenko I, Roderick PJ, Ruokonen A, Samani NJ, Sanna S, Schalling M, Schlessinger D, Schlieper G, Seelen MA, Shuldiner AR, Sjögren M, Smit JH, Snieder H, Soranzo N, Spector TD, Stenvinkel P, Sternberg MJ, Swaminathan R, Tanaka T, Ubink-Veltmaat LJ, Uda M, Vollenweider P, Wallace C, Waterworth D, Zerres K, Waeber G, Wareham NJ, Maxwell PH, McCarthy MI, Jarvelin MR, Mooser V, Abecasis GR, Lightstone L, Scott J, Navis G, Elliott P and Kooner JS

    Department of Epidemiology and Biostatistics, School of Public Health, Imperial College of London, London, UK. john.chambers@ic.ac.uk

    Using genome-wide association, we identify common variants at 2p12-p13, 6q26, 17q23 and 19q13 associated with serum creatinine, a marker of kidney function (P = 10(-10) to 10(-15)). Of these, rs10206899 (near NAT8, 2p12-p13) and rs4805834 (near SLC7A9, 19q13) were also associated with chronic kidney disease (P = 5.0 x 10(-5) and P = 3.6 x 10(-4), respectively). Our findings provide insight into metabolic, solute and drug-transport pathways underlying susceptibility to chronic kidney disease.

    Nature genetics 2010;42;5;373-5

  • The impact of gene expression regulation on evolution of extracellular signaling pathways.

    Charoensawan V, Adryan B, Martin S, Söllner C, Thisse B, Thisse C, Wright GJ and Teichmann SA

    Medical Research Council Laboratory of Molecular Biology, Cambridge CB20QH, United Kingdom. varodom@mrc-lmb.cam.ac.uk

    Extracellular protein interactions are crucial to the development of multicellular organisms because they initiate signaling pathways and enable cellular recognition cues. Despite their importance, extracellular protein interactions are often under-represented in large scale protein interaction data sets because most high throughput assays are not designed to detect low affinity extracellular interactions. Due to the lack of a comprehensive data set, the evolution of extracellular signaling pathways has remained largely a mystery. We investigated this question using a combined data set of physical pairwise interactions between zebrafish extracellular proteins, mainly from the immunoglobulin superfamily and leucine-rich repeat families, and their spatiotemporal expression profiles. We took advantage of known homology between proteins to estimate the relative rates of changes of four parameters after gene duplication, namely extracellular protein interaction, expression pattern, and the divergence of extracellular and intracellular protein sequences. We showed that change in expression profile is a major contributor to the evolution of signaling pathways followed by divergence in intracellular protein sequence, whereas extracellular sequence and interaction profiles were relatively more conserved. Rapidly evolving expression profiles will eventually drive other parameters to diverge more quickly because differentially expressed proteins get exposed to different environments and potential binding partners. This allows homologous extracellular receptors to attain specialized functions and become specific to tissues and/or developmental stages.

    Funded by: Medical Research Council: MC_U105161047; Wellcome Trust: 077108/Z/05/Z

    Molecular & cellular proteomics : MCP 2010;9;12;2666-77

  • Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042.

    Chaudhuri RR, Sebaihia M, Hobman JL, Webber MA, Leyton DL, Goldberg MD, Cunningham AF, Scott-Tucker A, Ferguson PR, Thomas CM, Frankel G, Tang CM, Dudley EG, Roberts IS, Rasko DA, Pallen MJ, Parkhill J, Nataro JP, Thomson NR and Henderson IR

    Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.

    Background: Escherichia coli can experience a multifaceted life, in some cases acting as a commensal while in other cases causing intestinal and/or extraintestinal disease. Several studies suggest enteroaggregative E. coli are the predominant cause of E. coli-mediated diarrhea in the developed world and are second only to Campylobacter sp. as a cause of bacterial-mediated diarrhea. Furthermore, enteroaggregative E. coli are a predominant cause of persistent diarrhea in the developing world where infection has been associated with malnourishment and growth retardation.

    Methods: In this study we determined the complete genomic sequence of E. coli 042, the prototypical member of the enteroaggregative E. coli, which has been shown to cause disease in volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains revealing previously uncharacterised virulence factors including a variety of secreted proteins and a capsular polysaccharide biosynthetic locus. In addition, by using Biolog Phenotype Microarrays we have provided a full metabolic profiling of E. coli 042 and the non-pathogenic lab strain E. coli K-12. We have highlighted the genetic basis for many of the metabolic differences between E. coli 042 and E. coli K-12.

    Conclusion: This study provides a genetic context for the vast amount of experimental and epidemiological data published thus far and provides a template for future diagnostic and intervention strategies.

    Funded by: Medical Research Council: G0801209, G9818340B

    PloS one 2010;5;1;e8801

  • Ensembl variation resources.

    Chen Y, Cunningham F, Rios D, McLaren WM, Smith J, Pritchard B, Spudich GM, Brent S, Kulesha E, Marin-Garcia P, Smedley D, Birney E and Flicek P

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    Background: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.

    Description: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.

    Conclusions: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

    Funded by: Medical Research Council; Wellcome Trust

    BMC genomics 2010;11;293

  • Common variants near TERC are associated with mean telomere length.

    Codd V, Mangino M, van der Harst P, Braund PS, Kaiser M, Beveridge AJ, Rafelt S, Moore J, Nelson C, Soranzo N, Zhai G, Valdes AM, Blackburn H, Mateo Leach I, de Boer RA, Kimura M, Aviv A, Wellcome Trust Case Control Consortium, Goodall AH, Ouwehand W, van Veldhuisen DJ, van Gilst WH, Navis G, Burton PR, Tobin MD, Hall AS, Thompson JR, Spector T and Samani NJ

    Department of Cardiovascular Sciences, University of Leicester, Glenfield Hospital, Leicester, UK.

    We conducted genome-wide association analyses of mean leukocyte telomere length in 2,917 individuals, with follow-up replication in 9,492 individuals. We identified an association with telomere length on 3q26 (rs12696304, combined P = 3.72 x 10(-14)) at a locus that includes TERC, which encodes the telomerase RNA component. Each copy of the minor allele of rs12696304 was associated with an approximately 75-base-pair reduction in mean telomere length, equivalent to approximately 3.6 years of age-related telomere-length attrition.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust

    Nature genetics 2010;42;3;197-9

  • The dopamine β-hydroxylase -1021C/T polymorphism is associated with the risk of Alzheimer's disease in the Epistasis Project.

    Combarros O, Warden DR, Hammond N, Cortina-Borja M, Belbin O, Lehmann MG, Wilcock GK, Brown K, Kehoe PG, Barber R, Coto E, Alvarez V, Deloukas P, Gwilliam R, Heun R, Kölsch H, Mateo I, Oulhaj A, Arias-Vásquez A, Schuur M, Aulchenko YS, Ikram MA, Breteler MM, van Duijn CM, Morgan K, Smith AD and Lehmann DJ

    Neurology Service and Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas, Marqués de Valdecilla University Hospital (University of Cantabria), 39008 Santander, Spain. combarro@unican.es

    Background: The loss of noradrenergic neurones of the locus coeruleus is a major feature of Alzheimer's disease (AD). Dopamine β-hydroxylase (DBH) catalyses the conversion of dopamine to noradrenaline. Interactions have been reported between the low-activity -1021T allele (rs1611115) of DBH and polymorphisms of the pro-inflammatory cytokine genes, IL1A and IL6, contributing to the risk of AD. We therefore examined the associations with AD of the DBH -1021T allele and of the above interactions in the Epistasis Project, with 1757 cases of AD and 6294 elderly controls.

    Methods: We genotyped eight single nucleotide polymorphisms (SNPs) in the three genes, DBH, IL1A and IL6. We used logistic regression models and synergy factor analysis to examine potential interactions and associations with AD.

    Results: We found that the presence of the -1021T allele was associated with AD: odds ratio = 1.2 (95% confidence interval: 1.06-1.4, p = 0.005). This association was nearly restricted to men < 75 years old: odds ratio = 2.2 (1.4-3.3, 0.0004). We also found an interaction between the presence of DBH -1021T and the -889TT genotype (rs1800587) of IL1A: synergy factor = 1.9 (1.2-3.1, 0.005). All these results were consistent between North Europe and North Spain.

    Conclusions: Extensive, previous evidence (reviewed here) indicates an important role for noradrenaline in the control of inflammation in the brain. Thus, the -1021T allele with presumed low activity may be associated with misregulation of inflammation, which could contribute to the onset of AD. We suggest that such misregulation is the predominant mechanism of the association we report here.

    Funded by: Medical Research Council: G0400546

    BMC medical genetics 2010;11;162

  • Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.

    Conrad DF, Bird C, Blackburne B, Lindsay S, Mamanova L, Lee C, Turner DJ and Hurles ME

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 10% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.

    Funded by: Wellcome Trust: 077014, 077014/Z/05/Z

    Nature genetics 2010;42;5;385-91

  • Origins and functional impact of copy number variation in the human genome.

    Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW and Hurles ME

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.

    Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

    Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221; NIGMS NIH HHS: GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014, 088340

    Nature 2010;464;7289;704-12

  • Strong genetic evidence for a selective influence of GABAA receptors on a component of the bipolar disorder phenotype.

    Craddock N, Jones L, Jones IR, Kirov G, Green EK, Grozeva D, Moskvina V, Nikolov I, Hamshere ML, Vukcevic D, Caesar S, Gordon-Smith K, Fraser C, Russell E, Norton N, Breen G, St Clair D, Collier DA, Young AH, Ferrier IN, Farmer A, McGuffin P, Holmans PA, Wellcome Trust Case Control Consortium (WTCCC), Donnelly P, Owen MJ and O'Donovan MC

    Department of Psychological Medicine, School of Medicine, Cardiff University, Cardiff, UK. craddockn@cardiff.ac.uk

    Despite compelling evidence for a major genetic contribution to risk of bipolar mood disorder, conclusive evidence implicating specific genes or pathophysiological systems has proved elusive. In part this is likely to be related to the unknown validity of current phenotype definitions and consequent aetiological heterogeneity of samples. In the recent Wellcome Trust Case Control Consortium genome-wide association analysis of bipolar disorder (1868 cases, 2938 controls) one of the most strongly associated polymorphisms lay within the gene encoding the GABA(A) receptor beta1 subunit, GABRB1. Aiming to increase biological homogeneity, we sought the diagnostic subset that showed the strongest signal at this polymorphism and used this to test for independent evidence of association with other members of the GABA(A) receptor gene family. The index signal was significantly enriched in the 279 cases meeting Research Diagnostic Criteria for schizoaffective disorder, bipolar type (P=3.8 x 10(-6)). Independently, these cases showed strong evidence that variation in GABA(A) receptor genes influences risk for this phenotype (independent system-wide P=6.6 x 10(-5)) with association signals also at GABRA4, GABRB3, GABRA5 and GABRR3. [corrected] Our findings have the potential to inform understanding of presentation, pathogenesis and nosology of bipolar disorders. Our method of phenotype refinement may be useful in studies of other complex psychiatric and non-psychiatric disorders.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 079643

    Molecular psychiatry 2010;15;2;146-53

  • A rapid and scalable method for selecting recombinant mouse monoclonal antibodies.

    Crosnier C, Staudt N and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK.

    Background: Monoclonal antibodies with high affinity and selectivity that work on wholemount fixed tissues are valuable reagents to the cell and developmental biologist, and yet isolating them remains a long and unpredictable process. Here we report a rapid and scalable method to select and express recombinant mouse monoclonal antibodies that are essentially equivalent to those secreted by parental IgG-isotype hybridomas.

    Results: Increased throughput was achieved by immunizing mice with pools of antigens and cloning - from small numbers of hybridoma cells - the functionally rearranged light and heavy chains into a single expression plasmid. By immunizing with the ectodomains of zebrafish cell surface receptor proteins expressed in mammalian cells and screening for formalin-resistant epitopes, we selected antibodies that gave expected staining patterns on wholemount fixed zebrafish embryos.

    Conclusions: This method can be used to quickly select several high quality monoclonal antibodies from a single immunized mouse and facilitates their distribution using plasmids.

    Funded by: NINDS NIH HHS: R01NS063400; Wellcome Trust: 077108/Z/05/Z

    BMC biology 2010;8;76

  • A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407.

    Crossman LC, Chaudhuri RR, Beatson SA, Wells TJ, Desvaux M, Cunningham AF, Petty NK, Mahon V, Brinkley C, Hobman JL, Savarino SJ, Turner SM, Pallen MJ, Penn CW, Parkhill J, Turner AK, Johnson TJ, Thomson NR, Smith SG and Henderson IR

    The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, United Kingdom.

    In most cases, Escherichia coli exists as a harmless commensal organism, but it may on occasion cause intestinal and/or extraintestinal disease. Enterotoxigenic E. coli (ETEC) is the predominant cause of E. coli-mediated diarrhea in the developing world and is responsible for a significant portion of pediatric deaths. In this study, we determined the complete genomic sequence of E. coli H10407, a prototypical strain of enterotoxigenic E. coli, which reproducibly elicits diarrhea in human volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains, revealing that the chromosome is closely related to that of the nonpathogenic commensal strain E. coli HS and to those of the laboratory strains E. coli K-12 and C. Furthermore, these analyses demonstrated that there were no chromosomally encoded factors unique to any sequenced ETEC strains. Comparison of the E. coli H10407 plasmids with those from several ETEC strains revealed that the plasmids had a mosaic structure but that several loci were conserved among ETEC strains. This study provides a genetic context for the vast amount of experimental and epidemiological data that have been published.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/C510075/1; Medical Research Council: G0801209, G9818340B; Wellcome Trust

    Journal of bacteriology 2010;192;21;5822-31

  • Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes.

    Dalgliesh GL, Furge K, Greenman C, Chen L, Bignell G, Butler A, Davies H, Edkins S, Hardy C, Latimer C, Teague J, Andrews J, Barthorpe S, Beare D, Buck G, Campbell PJ, Forbes S, Jia M, Jones D, Knott H, Kok CY, Lau KW, Leroy C, Lin ML, McBride DJ, Maddison M, Maguire S, McLay K, Menzies A, Mironenko T, Mulderrig L, Mudie L, O'Meara S, Pleasance E, Rajasingham A, Shepherd R, Smith R, Stebbings L, Stephens P, Tang G, Tarpey PS, Turrell K, Dykema KJ, Khoo SK, Petillo D, Wondergem B, Anema J, Kahnoski RJ, Teh BT, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Clear cell renal cell carcinoma (ccRCC) is the most common form of adult kidney cancer, characterized by the presence of inactivating mutations in the VHL gene in most cases, and by infrequent somatic mutations in known cancer genes. To determine further the genetics of ccRCC, we have sequenced 101 cases through 3,544 protein-coding genes. Here we report the identification of inactivating mutations in two genes encoding enzymes involved in histone modification-SETD2, a histone H3 lysine 36 methyltransferase, and JARID1C (also known as KDM5C), a histone H3 lysine 4 demethylase-as well as mutations in the histone H3 lysine 27 demethylase, UTX (KMD6A), that we recently reported. The results highlight the role of mutations in components of the chromatin modification machinery in human cancer. Furthermore, NF2 mutations were found in non-VHL mutated ccRCC, and several other probable cancer genes were identified. These results indicate that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene, and that systematic screens will be key to fully determining the somatic genetic architecture of cancer.

    Funded by: Wellcome Trust: 077012, 077012/Z/05/Z, 082359, 088340, 093867

    Nature 2010;463;7279;360-3

  • Analysis of TBC1D4 in patients with severe insulin resistance.

    Dash S, Langenberg C, Fawcett KA, Semple RK, Romeo S, Sharp S, Sano H, Lienhard GE, Rochford JJ, Howlett T, Massoud AF, Hindmarsh P, Howell SJ, Wilkinson RJ, Lyssenko V, Groop L, Baroni MG, Barroso I, Wareham NJ, O'Rahilly S and Savage DB

    Funded by: Medical Research Council: G0600414, G0800203, MC_U106179471, MC_U117588499; NIDDK NIH HHS: DK25336, R01 DK025336, R56 DK025336; Wellcome Trust: 072070, 077016, 088316

    Diabetologia 2010;53;6;1239-42

  • Leishmania-specific surface antigens show sub-genus sequence variation and immune recognition.

    Depledge DP, MacLean LM, Hodgkinson MR, Smith BA, Jackson AP, Ma S, Uliana SR and Smith DF

    Centre for Immunology and Infection, Department of Biology, Hull York Medical School, University of York, York, United Kingdom.

    Background: A family of hydrophilic acylated surface (HASP) proteins, containing extensive and variant amino acid repeats, is expressed at the plasma membrane in infective extracellular (metacyclic) and intracellular (amastigote) stages of Old World Leishmania species. While HASPs are antigenic in the host and can induce protective immune responses, the biological functions of these Leishmania-specific proteins remain unresolved. Previous genome analysis has suggested that parasites of the sub-genus Leishmania (Viannia) have lost HASP genes from their genomes.

    We have used molecular and cellular methods to analyse HASP expression in New World Leishmania mexicana complex species and show that, unlike in L. major, these proteins are expressed predominantly following differentiation into amastigotes within macrophages. Further genome analysis has revealed that the L. (Viannia) species, L. (V.) braziliensis, does express HASP-like proteins of low amino acid similarity but with similar biochemical characteristics, from genes present on a region of chromosome 23 that is syntenic with the HASP/SHERP locus in Old World Leishmania species and the L. (L.) mexicana complex. A related gene is also present in Leptomonas seymouri and this may represent the ancestral copy of these Leishmania-genus specific sequences. The L. braziliensis HASP-like proteins (named the orthologous (o) HASPs) are predominantly expressed on the plasma membrane in amastigotes and are recognised by immune sera taken from 4 out of 6 leishmaniasis patients tested in an endemic region of Brazil. Analysis of the repetitive domains of the oHASPs has shown considerable genetic variation in parasite isolates taken from the same patients, suggesting that antigenic change may play a role in immune recognition of this protein family.

    These findings confirm that antigenic hydrophilic acylated proteins are expressed from genes in the same chromosomal region in species across the genus Leishmania. These proteins are surface-exposed on amastigotes (although L. (L.) major parasites also express HASPB on the metacyclic plasma membrane). The central repetitive domains of the HASPs are highly variant in their amino acid sequences, both within and between species, consistent with a role in immune recognition in the host.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0900950, G9721629; Wellcome Trust: 048615, 076355, 077503

    PLoS neglected tropical diseases 2010;4;9;e829

  • Ectodomains of the LDL receptor-related proteins LRP1b and LRP4 have anchorage independent functions in vivo.

    Dietrich MF, van der Weyden L, Prosser HM, Bradley A, Herz J and Adams DJ

    Department of Molecular Genetics, UT Southwestern, Dallas, Texas, United States of America.

    Background: The low-density lipoprotein (LDL) receptor gene family is a highly conserved group of membrane receptors with diverse functions in developmental processes, lipoprotein trafficking, and cell signaling. The low-density lipoprotein (LDL) receptor-related protein 1b (LRP1B) was reported to be deleted in several types of human malignancies, including non-small cell lung cancer. Our group has previously reported that a distal extracellular truncation of murine Lrp1b that is predicted to secrete the entire intact extracellular domain (ECD) is fully viable with no apparent phenotype.

    Methods and principal findings: Here, we have used a gene targeting approach to create two mouse lines carrying internally rearranged exons of Lrp1b that are predicted to truncate the protein closer to the N-terminus and to prevent normal trafficking through the secretary pathway. Both mutations result in early embryonic lethality, but, as expected from the restricted expression pattern of LRP1b in vivo, loss of Lrp1b does not cause cellular lethality as homozygous Lrp1b-deficient blastocysts can be propagated normally in culture. This is similar to findings for another LDL receptor family member, Lrp4. We provide in vitro evidence that Lrp4 undergoes regulated intramembraneous processing through metalloproteases and gamma-secretase cleavage. We further demonstrate negative regulation of the Wnt signaling pathway by the soluble extracellular domain.

    Conclusions and significance: Our results underline a crucial role for Lrp1b in development. The expression in mice of truncated alleles of Lrp1b and Lrp4 with deletions of the transmembrane and intracellular domains leads to release of the extracellular domain into the extracellular space, which is sufficient to confer viability. In contrast, null mutations are embryonically (Lrp1b) or perinatally (Lrp4) lethal. These findings suggest that the extracellular domains of both proteins may function as a scavenger for signaling ligands or signal modulators in the extracellular space, thereby preserving signaling thresholds that are critical for embryonic development, as well as for the clear, but poorly understood role of LRP1b in cancer.

    Funded by: Cancer Research UK; NHLBI NIH HHS: R37 HL063762; Wellcome Trust

    PloS one 2010;5;4;e9960

  • Multiple common variants for celiac disease influencing immune gene expression.

    Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Adány R, Aromaa A, Bardella MT, van den Berg LH, Bockett NA, de la Concha EG, Dema B, Fehrmann RS, Fernández-Arquero M, Fiatal S, Grandone E, Green PM, Groen HJ, Gwilliam R, Houwen RH, Hunt SE, Kaukinen K, Kelleher D, Korponay-Szabo I, Kurppa K, MacMathuna P, Mäki M, Mazzilli MC, McCann OT, Mearin ML, Mein CA, Mirza MM, Mistry V, Mora B, Morley KI, Mulder CJ, Murray JA, Núñez C, Oosterom E, Ophoff RA, Polanco I, Peltonen L, Platteel M, Rybak A, Salomaa V, Schweizer JJ, Sperandeo MP, Tack GJ, Turner G, Veldink JH, Verbeek WH, Weersma RK, Wolters VM, Urcelay E, Cukrowska B, Greco L, Neuhausen SL, McManus R, Barisani D, Deloukas P, Barrett JC, Saavalainen P, Wijmenga C and van Heel DA

    Blizard Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.

    We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

    Funded by: Medical Research Council: G0700545, G0700545(82277); NIDDK NIH HHS: DK050678, DK071003, DK081645, DK57892, R01 DK081645; NINDS NIH HHS: NS058980; Wellcome Trust: 084743

    Nature genetics 2010;42;4;295-302

  • New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.

    Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia-Naji N, Gloyn AL, Lindgren CM, Mägi R, Morris AP, Randall J, Johnson T, Elliott P, Rybin D, Thorleifsson G, Steinthorsdottir V, Henneman P, Grallert H, Dehghan A, Hottenga JJ, Franklin CS, Navarro P, Song K, Goel A, Perry JR, Egan JM, Lajunen T, Grarup N, Sparsø T, Doney A, Voight BF, Stringham HM, Li M, Kanoni S, Shrader P, Cavalcanti-Proença C, Kumari M, Qi L, Timpson NJ, Gieger C, Zabena C, Rocheleau G, Ingelsson E, An P, O'Connell J, Luan J, Elliott A, McCarroll SA, Payne F, Roccasecca RM, Pattou F, Sethupathy P, Ardlie K, Ariyurek Y, Balkau B, Barter P, Beilby JP, Ben-Shlomo Y, Benediktsson R, Bennett AJ, Bergmann S, Bochud M, Boerwinkle E, Bonnefond A, Bonnycastle LL, Borch-Johnsen K, Böttcher Y, Brunner E, Bumpstead SJ, Charpentier G, Chen YD, Chines P, Clarke R, Coin LJ, Cooper MN, Cornelis M, Crawford G, Crisponi L, Day IN, de Geus EJ, Delplanque J, Dina C, Erdos MR, Fedson AC, Fischer-Rosinsky A, Forouhi NG, Fox CS, Frants R, Franzosi MG, Galan P, Goodarzi MO, Graessler J, Groves CJ, Grundy S, Gwilliam R, Gyllensten U, Hadjadj S, Hallmans G, Hammond N, Han X, Hartikainen AL, Hassanali N, Hayward C, Heath SC, Hercberg S, Herder C, Hicks AA, Hillman DR, Hingorani AD, Hofman A, Hui J, Hung J, Isomaa B, Johnson PR, Jørgensen T, Jula A, Kaakinen M, Kaprio J, Kesaniemi YA, Kivimaki M, Knight B, Koskinen S, Kovacs P, Kyvik KO, Lathrop GM, Lawlor DA, Le Bacquer O, Lecoeur C, Li Y, Lyssenko V, Mahley R, Mangino M, Manning AK, Martínez-Larrad MT, McAteer JB, McCulloch LJ, McPherson R, Meisinger C, Melzer D, Meyre D, Mitchell BD, Morken MA, Mukherjee S, Naitza S, Narisu N, Neville MJ, Oostra BA, Orrù M, Pakyz R, Palmer CN, Paolisso G, Pattaro C, Pearson D, Peden JF, Pedersen NL, Perola M, Pfeiffer AF, Pichler I, Polasek O, Posthuma D, Potter SC, Pouta A, Province MA, Psaty BM, Rathmann W, Rayner NW, Rice K, Ripatti S, Rivadeneira F, Roden M, Rolandsson O, Sandbaek A, Sandhu M, Sanna S, Sayer AA, Scheet P, Scott LJ, Seedorf U, Sharp SJ, Shields B, Sigurethsson G, Sijbrands EJ, Silveira A, Simpson L, Singleton A, Smith NL, Sovio U, Swift A, Syddall H, Syvänen AC, Tanaka T, Thorand B, Tichet J, Tönjes A, Tuomi T, Uitterlinden AG, van Dijk KW, van Hoek M, Varma D, Visvikis-Siest S, Vitart V, Vogelzangs N, Waeber G, Wagner PJ, Walley A, Walters GB, Ward KL, Watkins H, Weedon MN, Wild SH, Willemsen G, Witteman JC, Yarnell JW, Zeggini E, Zelenika D, Zethelius B, Zhai G, Zhao JH, Zillikens MC, DIAGRAM Consortium, GIANT Consortium, Global BPgen Consortium, Borecki IB, Loos RJ, Meneton P, Magnusson PK, Nathan DM, Williams GH, Hattersley AT, Silander K, Salomaa V, Smith GD, Bornstein SR, Schwarz P, Spranger J, Karpe F, Shuldiner AR, Cooper C, Dedoussis GV, Serrano-Ríos M, Morris AD, Lind L, Palmer LJ, Hu FB, Franks PW, Ebrahim S, Marmot M, Kao WH, Pankow JS, Sampson MJ, Kuusisto J, Laakso M, Hansen T, Pedersen O, Pramstaller PP, Wichmann HE, Illig T, Rudan I, Wright AF, Stumvoll M, Campbell H, Wilson JF, Anders Hamsten on behalf of Procardis Consortium, MAGIC investigators, Bergman RN, Buchanan TA, Collins FS, Mohlke KL, Tuomilehto J, Valle TT, Altshuler D, Rotter JI, Siscovick DS, Penninx BW, Boomsma DI, Deloukas P, Spector TD, Frayling TM, Ferrucci L, Kong A, Thorsteinsdottir U, Stefansson K, van Duijn CM, Aulchenko YS, Cao A, Scuteri A, Schlessinger D, Uda M, Ruokonen A, Jarvelin MR, Waterworth DM, Vollenweider P, Peltonen L, Mooser V, Abecasis GR, Wareham NJ, Sladek R, Froguel P, Watanabe RM, Meigs JB, Groop L, Boehnke M, McCarthy MI, Florez JC and Barroso I

    Department of Biostatistics, Boston University School of Public Health, Massachusetts, USA.

    Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.

    Funded by: British Heart Foundation: RG/07/008/23674; Chief Scientist Office: CZB/4/710; Medical Research Council: G0100222, G0600331, G0600705, G0601261, G0700222, G0700222(81696), G0701863, G0801056, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128, MC_U127592696, MC_U137686857, MC_UP_A620_1014, MC_UP_A620_1015; NIDDK NIH HHS: K24 DK080140, P30 DK040561, P30 DK040561-14, P30 DK072488, R01 DK029867, R01 DK072193, R01 DK078616, R01 DK078616-01A1; The Dunhill Medical Trust: R69/0208; Wellcome Trust: 064890, 077011, 077016, 081682, 088885, 089061, 090532, 091746

    Nature genetics 2010;42;2;105-16

  • Traces of sub-Saharan and Middle Eastern lineages in Indian Muslim populations.

    Eaaswarkhanth M, Haque I, Ravesh Z, Romero IG, Meganathan PR, Dubey B, Khan FA, Chaubey G, Kivisild T, Tyler-Smith C, Singh L and Thangaraj K

    National DNA Analysis Centre, Central Forensic Science Laboratory, Kolkata, India.

    Islam is the second most practiced religion in India, next to Hinduism. It is still unclear whether the spread of Islam in India has been only a cultural transformation or is associated with detectable levels of gene flow. To estimate the contribution of West Asian and Arabian admixture to Indian Muslims, we assessed genetic variation in mtDNA, Y-chromosomal and LCT/MCM6 markers in 472, 431 and 476 samples, respectively, representing six Muslim communities from different geographical regions of India. We found that most of the Indian Muslim populations received their major genetic input from geographically close non-Muslim populations. However, low levels of likely sub-Saharan African, Arabian and West Asian admixture were also observed among Indian Muslims in the form of L0a2a2 mtDNA and E1b1b1a and J(*)(xJ2) Y-chromosomal lineages. The distinction between Iranian and Arabian sources was difficult to make with mtDNA and the Y chromosome, as the estimates were highly correlated because of similar gene pool compositions in the sources. In contrast, the LCT/MCM6 locus, which shows a clear distinction between the two sources, enabled us to rule out significant gene flow from Arabia. Overall, our results support a model according to which the spread of Islam in India was predominantly cultural conversion associated with minor but still detectable levels of gene flow from outside, primarily from Iran and Central Asia, rather than directly from the Arabian Peninsula.

    Funded by: Wellcome Trust: 077009

    European journal of human genetics : EJHG 2010;18;3;354-63

  • Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies.

    Elks CE, Perry JR, Sulem P, Chasman DI, Franceschini N, He C, Lunetta KL, Visser JA, Byrne EM, Cousminer DL, Gudbjartsson DF, Esko T, Feenstra B, Hottenga JJ, Koller DL, Kutalik Z, Lin P, Mangino M, Marongiu M, McArdle PF, Smith AV, Stolk L, van Wingerden SH, Zhao JH, Albrecht E, Corre T, Ingelsson E, Hayward C, Magnusson PK, Smith EN, Ulivi S, Warrington NM, Zgaga L, Alavere H, Amin N, Aspelund T, Bandinelli S, Barroso I, Berenson GS, Bergmann S, Blackburn H, Boerwinkle E, Buring JE, Busonero F, Campbell H, Chanock SJ, Chen W, Cornelis MC, Couper D, Coviello AD, d'Adamo P, de Faire U, de Geus EJ, Deloukas P, Döring A, Smith GD, Easton DF, Eiriksdottir G, Emilsson V, Eriksson J, Ferrucci L, Folsom AR, Foroud T, Garcia M, Gasparini P, Geller F, Gieger C, GIANT Consortium, Gudnason V, Hall P, Hankinson SE, Ferreli L, Heath AC, Hernandez DG, Hofman A, Hu FB, Illig T, Järvelin MR, Johnson AD, Karasik D, Khaw KT, Kiel DP, Kilpeläinen TO, Kolcic I, Kraft P, Launer LJ, Laven JS, Li S, Liu J, Levy D, Martin NG, McArdle WL, Melbye M, Mooser V, Murray JC, Murray SS, Nalls MA, Navarro P, Nelis M, Ness AR, Northstone K, Oostra BA, Peacock M, Palmer LJ, Palotie A, Paré G, Parker AN, Pedersen NL, Peltonen L, Pennell CE, Pharoah P, Polasek O, Plump AS, Pouta A, Porcu E, Rafnar T, Rice JP, Ring SM, Rivadeneira F, Rudan I, Sala C, Salomaa V, Sanna S, Schlessinger D, Schork NJ, Scuteri A, Segrè AV, Shuldiner AR, Soranzo N, Sovio U, Srinivasan SR, Strachan DP, Tammesoo ML, Tikkanen E, Toniolo D, Tsui K, Tryggvadottir L, Tyrer J, Uda M, van Dam RM, van Meurs JB, Vollenweider P, Waeber G, Wareham NJ, Waterworth DM, Weedon MN, Wichmann HE, Willemsen G, Wilson JF, Wright AF, Young L, Zhai G, Zhuang WV, Bierut LJ, Boomsma DI, Boyd HA, Crisponi L, Demerath EW, van Duijn CM, Econs MJ, Harris TB, Hunter DJ, Loos RJ, Metspalu A, Montgomery GW, Ridker PM, Spector TD, Streeten EA, Stefansson K, Thorsteinsdottir U, Uitterlinden AG, Widen E, Murabito JM, Ong KK and Murray A

    Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.

    To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P = 5.4 × 10⁻⁶⁰) and 9q31.2 (P = 2.2 × 10⁻³³), we identified 30 new menarche loci (all P < 5 × 10⁻⁸) and found suggestive evidence for a further 10 loci (P < 1.9 × 10⁻⁶). The new loci included four previously associated with body mass index (in or near FTO, SEC16B, TRA2B and TMEM18), three in or near other genes implicated in energy homeostasis (BSX, CRTC1 and MCHR2) and three in or near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and gene-set enrichment pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.

    Funded by: Canadian Institutes of Health Research: 166067; Cancer Research UK: 10118, 11022, A10119, A10124; Chief Scientist Office: CZB/4/710; Medical Research Council: G0000934, G0401527, G0500539, G0600705, G0701863, G0801056B, G9815508, MC_U106179471, MC_U106179472, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA089392, CA104021, CA136792, CA40356, CA54281, CA63464, CA98233, P01 CA055075, P01 CA055075-17, P01 CA087969, P01 CA087969-13, P01 CA089392, P01 CA089392-08, P01 CA089392-09, P01CA055075, P01CA087969, R01 CA040356-15S1, R01 CA047988, R01 CA047988-20, R01 CA063464, R01 CA063464-10, R01 CA104021-05, R37 CA054281, R37 CA054281-17, U01 CA098233, U01 CA098233-08, U01 CA136792, U01 CA136792-03, Z01 CP010200-03, Z01CP010200; NCRR NIH HHS: M01 RR 16500, M01 RR-00750, M01 RR000750-31, M01 RR016500-04, U54RR025204-01, UL1 RR025005, UL1 RR025005-05, UL1 RR025774, UL1 RR025774-05, UL1RR025005; NHGRI NIH HHS: U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, U01 HG004415-02, U01 HG004422, U01 HG004422-01, U01 HG004422-02, U01 HG004423, U01 HG004423-01, U01 HG004424-04, U01 HG004436, U01 HG004436-02, U01 HG004438, U01 HG004438-04, U01 HG004446, U01 HG004446-04, U01 HG004726, U01 HG004726-02, U01 HG004728, U01 HG004728-01, U01 HG004729, U01 HG004729-02, U01 HG004735, U01 HG004735-02, U01 HG004738, U01 HG004738-02, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004728, U01HG004729, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL 043851, HL087679, HL69757, N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-11, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01HL086694, R01HL087641, R01HL59367, RC2 HL102419, RC2 HL102419-02, U01 HL072515, U01 HL072515-06, U01 HL084756, U01 HL084756-03, U01 HL84756, U01HL72515, U19 HL069757, U19 HL069757-11; NIA NIH HHS: AG-16592, N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01 AG012109, N01 AG050002, N01-AG-1-2109, N01-AG-12100, N01-AG-5-0002, P01 AG018397, P01 AG018397-08, P01 AG025204-03, P01-AG-18397, R01 AG016592, R01 AG016592-10, R01 AG041517, R01 AR/AG 41398, R21 AG032598, R21 AG032598-02, R21AG032598; NIAAA NIH HHS: AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, K05 AA017688, R01 AA007535, R01 AA007535-08, R01 AA013320, R01 AA013320-05, R01 AA013321, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05, U10 AA008401, U10 AA008401-23, U10AA008401; NIAMS NIH HHS: R01 AR041398, R01 AR041398-15, R01 AR041398-20; NICHD NIH HHS: HD-061437, R03 HD061437, R03 HD061437-02; NIDA NIH HHS: R01 DA012854, R01 DA012854-09, R01 DA013423, R01 DA013423-05, R01 DA019963, R01 DA019963-01A2, R01 DA019963-02, R01 DA019963-03, R01-DA013423; NIDCR NIH HHS: U01 DE018903, U01 DE018903-02, U01 DE018993, U01 DE018993-01, U01DE018903, U01DE018993; NIDDK NIH HHS: P30 DK072488, R01 DK058845, R01 DK058845-11, R01DK058845, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH66206, R01 MH066206, R01 MH066206-05; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; PHS HHS: HHSN268200625226C, HHSN268200782096C, R01-088119, RFAHG006033; Wellcome Trust: 068545/Z/02, 076467/Z/05/Z, 077016/Z/05/Z, 079895, 89061/Z/09/Z

    Nature genetics 2010;42;12;1077-85

  • A high-throughput pharmaceutical screen identifies compounds with specific toxicity against BRCA2-deficient tumors.

    Evers B, Schut E, van der Burg E, Braumuller TM, Egan DA, Holstege H, Edser P, Adams DJ, Wade-Martins R, Bouwman P and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, the Netherlands.

    Purpose: Hereditary breast cancer is partly explained by germline mutations in BRCA1 and BRCA2. Although patients carry heterozygous mutations, their tumors have typically lost the remaining wild-type allele. Selectively targeting BRCA deficiency may therefore constitute an important therapeutic approach. Clinical trials applying this principle are underway, but it is unknown whether the compounds tested are optimal. It is therefore important to identify alternative compounds that specifically target BRCA deficiency and to test new combination therapies to establish optimal treatment strategies.

    Experimental design: We did a high-throughput pharmaceutical screen on BRCA2-deficient mouse mammary tumor cells and isogenic controls with restored BRCA2 function. Subsequently, we validated positive hits in vitro and in vivo using mice carrying BRCA2-deficient mammary tumors.

    Results: Three alkylators-chlorambucil, melphalan, and nimustine-displayed strong and specific toxicity against BRCA2-deficient cells. In vivo, these showed heterogeneous but generally strong BRCA2-deficient antitumor activity, with melphalan and nimustine doing better than cisplatin and the poly-(ADP-ribose)-polymerase inhibitor olaparib (AZD2281) in this small study. In vitro drug combination experiments showed synergistic interactions between the alkylators and olaparib. Tumor intervention studies combining nimustine and olaparib resulted in recurrence-free survival exceeding 330 days in 3 of 5 animals tested.

    Conclusions: We generated and validated a platform for identification of compounds with specific activity against BRCA2-deficient cells that translates well to the preclinical setting. Our data call for the re-evaluation of alkylators, especially melphalan and nimustine, alone or in combination with the poly-(ADP-ribose)-polymerase inhibitors, for the treatment of breast cancers with a defective BRCA pathway.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D012910/1; Cancer Research UK; Wellcome Trust

    Clinical cancer research : an official journal of the American Association for Cancer Research 2010;16;1;99-108

  • The genetics of obesity: FTO leads the way.

    Fawcett KA and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.

    In 2007, an association of single nucleotide polymorphisms (SNPs) in the fat mass and obesity-associated (FTO) gene region with body mass index (BMI) and risk of obesity was identified in multiple populations, making FTO the first locus unequivocally associated with adiposity. At the time, FTO was a gene of unknown function and it was not known whether these SNPs exerted their effect on adiposity by affecting FTO or neighboring genes. Therefore, this breakthrough association inspired a wealth of in silico, in vitro, and in vivo analyses in model organisms and humans to improve knowledge of FTO function. These studies suggested that FTO plays a role in controlling feeding behavior and energy expenditure. Here, we review the approaches taken that provide a blueprint for the study of other obesity-associated genes in the hope that this strategy will result in increased understanding of the biological mechanisms underlying body weight regulation.

    Funded by: Wellcome Trust: 077016/Z/05/Z

    Trends in genetics : TIG 2010;26;6;266-74

  • Detailed investigation of the role of common and low-frequency WFS1 variants in type 2 diabetes risk.

    Fawcett KA, Wheeler E, Morris AP, Ricketts SL, Hallmans G, Rolandsson O, Daly A, Wasson J, Permutt A, Hattersley AT, Glaser B, Franks PW, McCarthy MI, Wareham NJ, Sandhu MS and Barroso I

    Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.

    Objective: Wolfram syndrome 1 (WFS1) single nucleotide polymorphisms (SNPs) are associated with risk of type 2 diabetes. In this study we aimed to refine this association and investigate the role of low-frequency WFS1 variants in type 2 diabetes risk.

    Research design and methods: For fine-mapping, we sequenced WFS1 exons, splice junctions, and conserved noncoding sequences in samples from 24 type 2 diabetic case and 68 control subjects, selected tagging SNPs, and genotyped these in 959 U.K. type 2 diabetic case and 1,386 control subjects. The same genomic regions were sequenced in samples from 1,235 type 2 diabetic case and 1,668 control subjects to compare the frequency of rarer variants between case and control subjects.

    Results: Of 31 tagging SNPs, the strongest associated was the previously untested 3' untranslated region rs1046320 (P = 0.008); odds ratio 0.84 and P = 6.59 x 10(-7) on further replication in 3,753 case and 4,198 control subjects. High correlation between rs1046320 and the original strongest SNP (rs10010131) (r2 = 0.92) meant that we could not differentiate between their effects in our samples. There was no difference in the cumulative frequency of 82 rare (minor allele frequency [MAF] <0.01) nonsynonymous variants between type 2 diabetic case and control subjects (P = 0.79). Two intermediate frequency (MAF 0.01-0.05) nonsynonymous changes also showed no statistical association with type 2 diabetes.

    Conclusions: We identified six highly correlated SNPs that show strong and comparable associations with risk of type 2 diabetes, but further refinement of these associations will require large sample sizes (>100,000) or studies in ethnically diverse populations. Low frequency variants in WFS1 are unlikely to have a large impact on type 2 diabetes risk in white U.K. populations, highlighting the complexities of undertaking association studies with low-frequency variants identified by resequencing.

    Funded by: British Heart Foundation; Medical Research Council: MC_U106179471; Wellcome Trust: 064890, 077016, 077016/Z/05/Z, 081682

    Diabetes 2010;59;3;741-6

  • Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.

    Ferguson L, Lee SF, Chamberlain N, Nadeau N, Joron M, Baxter S, Wilkinson P, Papanicolaou A, Kumar S, Kee TJ, Clark R, Davidson C, Glithero R, Beasley H, Vogel H, Ffrench-Constant R and Jiggins C

    Department of Zoology, University of Cambridge, UK.

    The mimetic wing patterns of Heliconius butterflies are an excellent example of both adaptive radiation and convergent evolution. Alleles at the HmYb and HmSb loci control the presence/absence of hindwing bar and hindwing margin phenotypes respectively between divergent races of Heliconius melpomene, and also between sister species. Here, we used fine-scale linkage mapping to identify and sequence a BAC tilepath across the HmYb/Sb loci. We also generated transcriptome sequence data for two wing pattern forms of H. melpomene that differed in HmYb/Sb alleles using 454 sequencing technology. Custom scripts were used to process the sequence traces and generate transcriptome assemblies. Genomic sequence for the HmYb/Sb candidate region was annotated both using the MAKER pipeline and manually using transcriptome sequence reads. In total, 28 genes were identified in the HmYb/Sb candidate region, six of which have alternative splice forms. None of these are orthologues of genes previously identified as being expressed in butterfly wing pattern development, implying previously undescribed molecular mechanisms of pattern determination on Heliconius wings. The use of next-generation sequencing has therefore facilitated DNA annotation of a poorly characterized genome, and generated hypotheses regarding the identity of wing pattern at the HmYb/Sb loci.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E008836/1, BB/E011845/1, BB/G00661X/1; Medical Research Council: G0900740

    Molecular ecology 2010;19 Suppl 1;240-54

  • The Pfam protein families database.

    Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR and Bateman A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. rdf@sanger.ac.uk

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; Wellcome Trust: 087656, WT077044/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D211-22

  • Ensembl's 10th year.

    Flicek P, Aken BL, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Koscielny G, Kulesha E, Lawson D, Longden I, Massingham T, McLaren W, Megy K, Overduin B, Pritchard B, Rios D, Ruffier M, Schuster M, Slater G, Smedley D, Spudich G, Tang YA, Trevanion S, Vilella A, Vogel J, White S, Wilder SP, Zadissa A, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Smith J and Searle SM

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. flicek@ebi.ac.uk

    Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BBE0116401, BBS/B/13438, BBS/B/13462; Wellcome Trust: 062023, 077198

    Nucleic acids research 2010;38;Database issue;D557-62

  • Evaluating the discriminative power of multi-trait genetic risk scores for type 2 diabetes in a northern Swedish population.

    Fontaine-Bisson B, Renström F, Rolandsson O, MAGIC, Payne F, Hallmans G, Barroso I and Franks PW

    Department of Nutrition Sciences, University of Ottawa, Ottawa, ON, Canada.

    Aims/hypothesis: We determined whether single nucleotide polymorphisms (SNPs) previously associated with diabetogenic traits improve the discriminative power of a type 2 diabetes genetic risk score.

    Methods: Participants (n = 2,751) were genotyped for 73 SNPs previously associated with type 2 diabetes, fasting glucose/insulin concentrations, obesity or lipid levels, from which five genetic risk scores (one for each of the four traits and one combining all SNPs) were computed. Type 2 diabetes patients and non-diabetic controls (n = 1,327/1,424) were identified using medical records in addition to an independent oral glucose tolerance test.

    Results: Model 1, including only SNPs associated with type 2 diabetes, had a discriminative power of 0.591 (p < 1.00 x 10(-20) vs null model) as estimated by the area under the receiver operator characteristic curve (ROC AUC). Model 2, including only fasting glucose/insulin SNPs, had a significantly higher discriminative power than the null model (ROC AUC 0.543; p = 9.38 x 10(-6) vs null model), but lower discriminative power than model 1 (p = 5.92 x 10(-5)). Model 3, with only lipid-associated SNPs, had significantly higher discriminative power than the null model (ROC AUC 0.565; p = 1.44 x 10(-9)) and was not statistically different from model 1 (p = 0.083). The ROC AUC of model 4, which included only obesity SNPs, was 0.557 (p = 2.30 x 10(-7) vs null model) and smaller than model 1 (p = 0.025). Finally, the model including all SNPs yielded a significant improvement in discriminative power compared with the null model (p < 1.0 x 10(-20)) and model 1 (p = 1.32 x 10(-5)); its ROC AUC was 0.626.

    Conclusions/interpretation: Adding SNPs previously associated with fasting glucose, insulin, lipids or obesity to a genetic risk score for type 2 diabetes significantly increases the power to discriminate between people with and without clinically manifest type 2 diabetes compared with a model including only conventional type 2 diabetes loci.

    Funded by: Wellcome Trust: 077016/Z/05/Z

    Diabetologia 2010;53;10;2155-62

  • COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer.

    Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, Teague JW, Stratton MR and Futreal PA

    Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13,423 genes in almost 370,000 tumours, describing over 90,000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D652-7

  • Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

    Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Green T, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Raychaudhuri S, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Büning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Verspaget HW, Hugot JP, Karban A, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Russell R, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Wilson D, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma RK, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ and Parkes M

    Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany.

    We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: M01-RR00425; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, R01 HL087652, U01 HL080295; NIAMS NIH HHS: K08 AR055688, K08 AR055688-01A1S1, K08 AR055688-03, K08 AR055688-04; NIDDK NIH HHS: DK 063491, DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, DK069513, DK084554, DK76984, P01-DK046763, P30 DK043351, R01 DK064869; Wellcome Trust: 089120, WT089120/Z/09/Z

    Nature genetics 2010;42;12;1118-25

  • Nonobese diabetic congenic strain analysis of autoimmune diabetes reveals genetic complexity of the Idd18 locus and identifies Vav3 as a candidate gene.

    Fraser HI, Dendrou CA, Healy B, Rainbow DB, Howlett S, Smink LJ, Gregory S, Steward CA, Todd JA, Peterson LB and Wicker LS

    Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge.

    We have used the public sequencing and annotation of the mouse genome to delimit the previously resolved type 1 diabetes (T1D) insulin-dependent diabetes (Idd)18 interval to a region on chromosome 3 that includes the immunologically relevant candidate gene, Vav3. To test the candidacy of Vav3, we developed a novel congenic strain that enabled the resolution of Idd18 to a 604-kb interval, designated Idd18.1, which contains only two annotated genes: the complete sequence of Vav3 and the last exon of the gene encoding NETRIN G1, Ntng1. Targeted sequencing of Idd18.1 in the NOD mouse strain revealed that allelic variation between NOD and C57BL/6J (B6) occurs in noncoding regions with 138 single nucleotide polymorphisms concentrated in the introns between exons 20 and 27 and immediately after the 3' untranslated region. We observed differential expression of VAV3 RNA transcripts in thymocytes when comparing congenic mouse strains with B6 or NOD alleles at Idd18.1. The T1D protection associated with B6 alleles of Idd18.1/Vav3 requires the presence of B6 protective alleles at Idd3, which are correlated with increased IL-2 production and regulatory T cell function. In the absence of B6 protective alleles at Idd3, we detected a second T1D protective B6 locus, Idd18.3, which is closely linked to, but distinct from, Idd18.1. Therefore, genetic mapping, sequencing, and gene expression evidence indicate that alteration of VAV3 expression is an etiological factor in the development of autoimmune beta-cell destruction in NOD mice. This study also demonstrates that a congenic strain mapping approach can isolate closely linked susceptibility genes.

    Funded by: NIAID NIH HHS: AI 15416; Wellcome Trust: 061858, 061859, 079895

    Journal of immunology (Baltimore, Md. : 1950) 2010;184;9;5075-84

  • Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight.

    Freathy RM, Mook-Kanamori DO, Sovio U, Prokopenko I, Timpson NJ, Berry DJ, Warrington NM, Widen E, Hottenga JJ, Kaakinen M, Lange LA, Bradfield JP, Kerkhof M, Marsh JA, Mägi R, Chen CM, Lyon HN, Kirin M, Adair LS, Aulchenko YS, Bennett AJ, Borja JB, Bouatia-Naji N, Charoen P, Coin LJ, Cousminer DL, de Geus EJ, Deloukas P, Elliott P, Evans DM, Froguel P, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Glaser B, Groves CJ, Hartikainen AL, Hassanali N, Hirschhorn JN, Hofman A, Holly JM, Hyppönen E, Kanoni S, Knight BA, Laitinen J, Lindgren CM, Meta-Analyses of Glucose and Insulin-related traits Consortium, McArdle WL, O'Reilly PF, Pennell CE, Postma DS, Pouta A, Ramasamy A, Rayner NW, Ring SM, Rivadeneira F, Shields BM, Strachan DP, Surakka I, Taanila A, Tiesler C, Uitterlinden AG, van Duijn CM, Wellcome Trust Case Control Consortium, Wijga AH, Willemsen G, Zhang H, Zhao J, Wilson JF, Steegers EA, Hattersley AT, Eriksson JG, Peltonen L, Mohlke KL, Grant SF, Hakonarson H, Koppelman GH, Dedoussis GV, Heinrich J, Gillman MW, Palmer LJ, Frayling TM, Boomsma DI, Davey Smith G, Power C, Jaddoe VW, Jarvelin MR, Early Growth Genetics (EGG) Consortium and McCarthy MI

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK.

    To identify genetic variants associated with birth weight, we meta-analyzed six genome-wide association (GWA) studies (n = 10,623 Europeans from pregnancy/birth cohorts) and followed up two lead signals in 13 replication studies (n = 27,591). rs900400 near LEKR1 and CCNL1 (P = 2 x 10(-35)) and rs9883204 in ADCY5 (P = 7 x 10(-15)) were robustly associated with birth weight. Correlated SNPs in ADCY5 were recently implicated in regulation of glucose levels and susceptibility to type 2 diabetes, providing evidence that the well-described association between lower birth weight and subsequent type 2 diabetes has a genetic component, distinct from the proposed role of programming by maternal nutrition. Using data from both SNPs, we found that the 9% of Europeans carrying four birth weight-lowering alleles were, on average, 113 g (95% CI 89-137 g) lighter at birth than the 24% with zero or one alleles (P(trend) = 7 x 10(-30)). The impact on birth weight is similar to that of a mother smoking 4-5 cigarettes per day in the third trimester of pregnancy.

    Funded by: British Heart Foundation; Canadian Institutes of Health Research: MOP 82893; Chief Scientist Office: CZB/4/710; Department of Health: PHCS/C4/4/016; Diabetes UK: 08/0003692; FIC NIH HHS: TW05596; Medical Research Council: G0000934, G0500070, G0500539, G0600331, G0600705, G0601261, G0601653, G0700704B, G0800582, G0801056, G9815508; NCRR NIH HHS: RR20649; NHLBI NIH HHS: HL068041, HL085144, HL0876792; NICHD NIH HHS: HD034568, HD05450, HD056465, R24 HD050924; NIDDK NIH HHS: 1R01DK075787, DK075787, DK078150, DK56350; NIEHS NIH HHS: ES10126; NIMH NIH HHS: MH083268, MH63706; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 085301, 085541, 090532, 89061/Z/09/Z

    Nature genetics 2010;42;5;430-5

  • Mouse welfare terms

    Gardiner M, Wells S, Trower C, SALISBURY J, Mallon AM, Beck T, MELVIN D, Bussell J

    Animal Technology and Welfare. 2010;9;175

  • A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1.

    Genetic Analysis of Psoriasis Consortium &amp; the Wellcome Trust Case Control Consortium 2, Strange A, Capon F, Spencer CC, Knight J, Weale ME, Allen MH, Barton A, Band G, Bellenguez C, Bergboer JG, Blackwell JM, Bramon E, Bumpstead SJ, Casas JP, Cork MJ, Corvin A, Deloukas P, Dilthey A, Duncanson A, Edkins S, Estivill X, Fitzgerald O, Freeman C, Giardina E, Gray E, Hofer A, Hüffmeier U, Hunt SE, Irvine AD, Jankowski J, Kirby B, Langford C, Lascorz J, Leman J, Leslie S, Mallbris L, Markus HS, Mathew CG, McLean WH, McManus R, Mössner R, Moutsianas L, Naluai AT, Nestle FO, Novelli G, Onoufriadis A, Palmer CN, Perricone C, Pirinen M, Plomin R, Potter SC, Pujol RM, Rautanen A, Riveira-Munoz E, Ryan AW, Salmhofer W, Samuelsson L, Sawcer SJ, Schalkwijk J, Smith CH, Ståhle M, Su Z, Tazi-Ahnini R, Traupe H, Viswanathan AC, Warren RB, Weger W, Wolk K, Wood N, Worthington J, Young HS, Zeeuwen PL, Hayday A, Burden AD, Griffiths CE, Kere J, Reis A, McVean G, Evans DM, Brown MA, Barker JN, Peltonen L, Donnelly P and Trembath RC

    Wellcome Trust Centre for Human Genetics, Oxford, UK.

    To identify new susceptibility loci for psoriasis, we undertook a genome-wide association study of 594,224 SNPs in 2,622 individuals with psoriasis and 5,667 controls. We identified associations at eight previously unreported genomic loci. Seven loci harbored genes with recognized immune functions (IL28RA, REL, IFIH1, ERAP1, TRAF3IP2, NFKBIA and TYK2). These associations were replicated in 9,079 European samples (six loci with a combined P < 5 × 10⁻⁸ and two loci with a combined P < 5 × 10⁻⁷). We also report compelling evidence for an interaction between the HLA-C and ERAP1 loci (combined P = 6.95 × 10⁻⁶). ERAP1 plays an important role in MHC class I peptide processing. ERAP1 variants only influenced psoriasis susceptibility in individuals carrying the HLA-C risk allele. Our findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis.

    Funded by: Department of Health; Medical Research Council: G0000934, G0601387; Wellcome Trust: 068545/Z/02, 083948/Z/07/Z, 084726

    Nature genetics 2010;42;11;985-90

  • Transcription profiling in human platelets reveals LRRFIP1 as a novel protein regulating platelet function.

    Goodall AH, Burns P, Salles I, Macaulay IC, Jones CI, Ardissino D, de Bono B, Bray SL, Deckmyn H, Dudbridge F, Fitzgerald DJ, Garner SF, Gusnanto A, Koch K, Langford C, O'Connor MN, Rice CM, Stemple D, Stephens J, Trip MD, Zwaginga JJ, Samani NJ, Watkins NA, Maguire PB, Ouwehand WH and Bloodomics Consortium

    Department of Cardiovascular Science, University of Leicester, Clinical Sciences Wing, Glenfield Hospital, Leicester, UK. ahg5@le.ac.uk

    Within the healthy population, there is substantial, heritable, and interindividual variability in the platelet response. We explored whether a proportion of this variability could be accounted for by interindividual variation in gene expression. Through a correlative analysis of genome-wide platelet RNA expression data from 37 subjects representing the normal range of platelet responsiveness within a cohort of 500 subjects, we identified 63 genes in which transcript levels correlated with variation in the platelet response to adenosine diphosphate and/or the collagen-mimetic peptide, cross-linked collagen-related peptide. Many of these encode proteins with no reported function in platelets. An association study of 6 of the 63 genes in 4235 cases and 6379 controls showed a putative association with myocardial infarction for COMMD7 (COMM domain-containing protein 7) and a major deviation from the null hypo thesis for LRRFIP1 [leucine-rich repeat (in FLII) interacting protein 1]. Morpholino-based silencing in Danio rerio identified a modest role for commd7 and a significant effect for lrrfip1 as positive regulators of thrombus formation. Proteomic analysis of human platelet LRRFIP1-interacting proteins indicated that LRRFIP1 functions as a component of the platelet cytoskeleton, where it interacts with the actin-remodeling proteins Flightless-1 and Drebrin. Taken together, these data reveal novel proteins regulating the platelet response.

    Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MC_U105292688

    Blood 2010;116;22;4646-56

  • Computing behaviour in complex synapses

    Grant, S.G

    Biochemist. 2010;32;6-9

  • Rare copy number variants: a point of rarity in genetic risk for bipolar disorder and schizophrenia.

    Grozeva D, Kirov G, Ivanov D, Jones IR, Jones L, Green EK, St Clair DM, Young AH, Ferrier N, Farmer AE, McGuffin P, Holmans PA, Owen MJ, O'Donovan MC, Craddock N and Wellcome Trust Case Control Consortium

    Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff CF14 4XN, Wales, UK.

    Context: Recent studies suggest that copy number variation in the human genome is extensive and may play an important role in susceptibility to disease, including neuropsychiatric disorders such as schizophrenia and autism. The possible involvement of copy number variants (CNVs) in bipolar disorder has received little attention to date.

    Objectives: To determine whether large (>100,000 base pairs) and rare (found in <1% of the population) CNVs are associated with susceptibility to bipolar disorder and to compare with findings in schizophrenia.

    Design: A genome-wide survey of large, rare CNVs in a case-control sample using a high-density microarray.

    Setting: The Wellcome Trust Case Control Consortium.

    Participants: There were 1697 cases of bipolar disorder and 2806 nonpsychiatric controls. All participants were white UK residents.

    Main outcome measures: Overall load of CNVs and presence of rare CNVs.

    Results: The burden of CNVs in bipolar disorder was not increased compared with controls and was significantly less than in schizophrenia cases. The CNVs previously implicated in the etiology of schizophrenia were not more common in cases with bipolar disorder.

    Conclusions: Schizophrenia and bipolar disorder differ with respect to CNV burden in general and association with specific CNVs in particular. Our data are consistent with the possibility that possession of large, rare deletions may modify the phenotype in those at risk of psychosis: those possessing such events are more likely to be diagnosed as having schizophrenia, and those without them are more likely to be diagnosed as having bipolar disorder.

    Funded by: Chief Scientist Office: CZB/4/540, ETM/75; Medical Research Council: G0600329, G0701003, G0701420, G0800509, G0800759, G90/106, G9817803B; Wellcome Trust: 061858, 090532

    Archives of general psychiatry 2010;67;4;318-27

  • Being more realistic about the public health impact of genomic medicine.

    Hall WD, Mathews R and Morley KI

    University of Queensland Centre for Clinical Research, The University of Queensland, Herston, Queensland, Australia. w.hall@uq.edu.au

    PLoS medicine 2010;7;10

  • A pharmacometric model describing the relationship between warfarin dose and INR response with respect to variations in CYP2C9, VKORC1, and age.

    Hamberg AK, Wadelius M, Lindh JD, Dahl ML, Padrini R, Deloukas P, Rane A and Jonsson EN

    Department of Medical Sciences, Clinical Pharmacology, Uppsala University Hospital, Uppsala, Sweden. anna-karin.hamberg@medsci.uu.se

    The objective of the study was to update a previous NONMEM model to describe the relationship between warfarin dose and international normalized ratio (INR) response, to decrease the dependence of the model on pharmacokinetic (PK) data, and to improve the characterization of rare genotype combinations. The effects of age and CYP2C9 genotype on S-warfarin clearance were estimated from high-quality PK data. Thereafter, a temporal dose-response (K-PD) model was developed from information on dose, INR, age, and CYP2C9 and VKORC1 genotype, with drug clearance as a covariate. Two transit compartment chains accounted for the delay between exposure and response. CYP2C9 genotype was identified as the single most important predictor of required dose, causing a difference of up to 4.2-fold in the maintenance dose. VKORC1 accounted for a difference of up to 2.1-fold in dose, and age reduced the dose requirement by ~6% per decade. This reformulated K-PD model decreases dependence on PK data and enables robust assessment of INR response and dose predictions, even in individuals with rare genotype combinations.

    Clinical pharmacology and therapeutics 2010;87;6;727-34

  • KSHV-encoded miRNAs target MAF to induce endothelial cell reprogramming.

    Hansen A, Henderson S, Lagos D, Nikitenko L, Coulter E, Roberts S, Gratrix F, Plaisance K, Renne R, Bower M, Kellam P and Boshoff C

    Cancer Research UK Viral Oncology Group, University College London Cancer Institute, University College London, London WC1E 6BT, United Kingdom.

    Kaposi sarcoma herpesvirus (KSHV) induces transcriptional reprogramming of endothelial cells. In particular, KSHV-infected lymphatic endothelial cells (LECs) show an up-regulation of genes associated with blood vessel endothelial cells (BECs). Consequently, KSHV-infected tumor cells in Kaposi sarcoma are poorly differentiated endothelial cells, expressing markers of both LECs and BECs. MicroRNAs (miRNAs) are short noncoding RNA molecules that act post-transcriptionally to negatively regulate gene expression. Here we validate expression of the KSHV-encoded miRNAs in Kaposi sarcoma lesions and demonstrate that these miRNAs contribute to viral-induced reprogramming by silencing the cellular transcription factor MAF (musculoaponeurotic fibrosarcoma oncogene homolog). MAF is expressed in LECs but not in BECs. We identify a novel role for MAF as a transcriptional repressor, preventing expression of BEC-specific genes, thereby maintaining the differentiation status of LECs. These findings demonstrate that viral miRNAs could influence the differentiation status of infected cells, and thereby contribute to KSHV-induced oncogenesis.

    Funded by: Cancer Research UK; Medical Research Council: G0800168

    Genes & development 2010;24;2;195-205

  • Evolution of MRSA during hospital transmission and intercontinental spread.

    Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA, Edgeworth JD, de Lencastre H, Parkhill J, Peacock SJ and Bentley SD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 15A, UK.

    Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.

    Funded by: Department of Health; Wellcome Trust: 076964

    Science (New York, N.Y.) 2010;327;5964;469-74

  • Evolutionary dynamics of Clostridium difficile over short and long time scales.

    He M, Sebaihia M, Lawley TD, Stabler RA, Dawson LF, Martin MJ, Holt KE, Seth-Smith HM, Quail MA, Rance R, Brooks K, Churcher C, Harris D, Bentley SD, Burrows C, Clark L, Corton C, Murray V, Rose G, Thurston S, van Tonder A, Walker D, Wren BW, Dougan G and Parkhill J

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

    Clostridium difficile has rapidly emerged as the leading cause of antibiotic-associated diarrheal disease, with the transcontinental spread of various PCR ribotypes, including 001, 017, 027 and 078. However, the genetic basis for the emergence of C. difficile as a human pathogen is unclear. Whole genome sequencing was used to analyze genetic variation and virulence of a diverse collection of thirty C. difficile isolates, to determine both macro and microevolution of the species. Horizontal gene transfer and large-scale recombination of core genes has shaped the C. difficile genome over both short and long time scales. Phylogenetic analysis demonstrates C. difficile is a genetically diverse species, which has evolved within the last 1.1-85 million years. By contrast, the disease-causing isolates have arisen from multiple lineages, suggesting that virulence evolved independently in the highly epidemic lineages.

    Funded by: Wellcome Trust

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;16;7527-32

  • Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution.

    Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, Steinthorsdottir V, Thorleifsson G, Zillikens MC, Speliotes EK, Mägi R, Workalemahu T, White CC, Bouatia-Naji N, Harris TB, Berndt SI, Ingelsson E, Willer CJ, Weedon MN, Luan J, Vedantam S, Esko T, Kilpeläinen TO, Kutalik Z, Li S, Monda KL, Dixon AL, Holmes CC, Kaplan LM, Liang L, Min JL, Moffatt MF, Molony C, Nicholson G, Schadt EE, Zondervan KT, Feitosa MF, Ferreira T, Lango Allen H, Weyant RJ, Wheeler E, Wood AR, MAGIC, Estrada K, Goddard ME, Lettre G, Mangino M, Nyholt DR, Purcell S, Smith AV, Visscher PM, Yang J, McCarroll SA, Nemesh J, Voight BF, Absher D, Amin N, Aspelund T, Coin L, Glazer NL, Hayward C, Heard-Costa NL, Hottenga JJ, Johansson A, Johnson T, Kaakinen M, Kapur K, Ketkar S, Knowles JW, Kraft P, Kraja AT, Lamina C, Leitzmann MF, McKnight B, Morris AP, Ong KK, Perry JR, Peters MJ, Polasek O, Prokopenko I, Rayner NW, Ripatti S, Rivadeneira F, Robertson NR, Sanna S, Sovio U, Surakka I, Teumer A, van Wingerden S, Vitart V, Zhao JH, Cavalcanti-Proença C, Chines PS, Fisher E, Kulzer JR, Lecoeur C, Narisu N, Sandholt C, Scott LJ, Silander K, Stark K, Tammesoo ML, Teslovich TM, Timpson NJ, Watanabe RM, Welch R, Chasman DI, Cooper MN, Jansson JO, Kettunen J, Lawrence RW, Pellikka N, Perola M, Vandenput L, Alavere H, Almgren P, Atwood LD, Bennett AJ, Biffar R, Bonnycastle LL, Bornstein SR, Buchanan TA, Campbell H, Day IN, Dei M, Dörr M, Elliott P, Erdos MR, Eriksson JG, Freimer NB, Fu M, Gaget S, Geus EJ, Gjesing AP, Grallert H, Grässler J, Groves CJ, Guiducci C, Hartikainen AL, Hassanali N, Havulinna AS, Herzig KH, Hicks AA, Hui J, Igl W, Jousilahti P, Jula A, Kajantie E, Kinnunen L, Kolcic I, Koskinen S, Kovacs P, Kroemer HK, Krzelj V, Kuusisto J, Kvaloy K, Laitinen J, Lantieri O, Lathrop GM, Lokki ML, Luben RN, Ludwig B, McArdle WL, McCarthy A, Morken MA, Nelis M, Neville MJ, Paré G, Parker AN, Peden JF, Pichler I, Pietiläinen KH, Platou CG, Pouta A, Ridderstråle M, Samani NJ, Saramies J, Sinisalo J, Smit JH, Strawbridge RJ, Stringham HM, Swift AJ, Teder-Laving M, Thomson B, Usala G, van Meurs JB, van Ommen GJ, Vatin V, Volpato CB, Wallaschofski H, Walters GB, Widen E, Wild SH, Willemsen G, Witte DR, Zgaga L, Zitting P, Beilby JP, James AL, Kähönen M, Lehtimäki T, Nieminen MS, Ohlsson C, Palmer LJ, Raitakari O, Ridker PM, Stumvoll M, Tönjes A, Viikari J, Balkau B, Ben-Shlomo Y, Bergman RN, Boeing H, Smith GD, Ebrahim S, Froguel P, Hansen T, Hengstenberg C, Hveem K, Isomaa B, Jørgensen T, Karpe F, Khaw KT, Laakso M, Lawlor DA, Marre M, Meitinger T, Metspalu A, Midthjell K, Pedersen O, Salomaa V, Schwarz PE, Tuomi T, Tuomilehto J, Valle TT, Wareham NJ, Arnold AM, Beckmann JS, Bergmann S, Boerwinkle E, Boomsma DI, Caulfield MJ, Collins FS, Eiriksdottir G, Gudnason V, Gyllensten U, Hamsten A, Hattersley AT, Hofman A, Hu FB, Illig T, Iribarren C, Jarvelin MR, Kao WH, Kaprio J, Launer LJ, Munroe PB, Oostra B, Penninx BW, Pramstaller PP, Psaty BM, Quertermous T, Rissanen A, Rudan I, Shuldiner AR, Soranzo N, Spector TD, Syvanen AC, Uda M, Uitterlinden A, Völzke H, Vollenweider P, Wilson JF, Witteman JC, Wright AF, Abecasis GR, Boehnke M, Borecki IB, Deloukas P, Frayling TM, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, North KE, O'Connell JR, Peltonen L, Schlessinger D, Strachan DP, Hirschhorn JN, Assimes TL, Wichmann HE, Thorsteinsdottir U, van Duijn CM, Stefansson K, Cupples LA, Loos RJ, Barroso I, McCarthy MI, Fox CS, Mohlke KL and Lindgren CM

    Regensburg University Medical Center, Department of Epidemiology and Preventive Medicine, Regensburg, Germany.

    Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10⁻⁹ to P = 1.8 × 10⁻⁴⁰) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10⁻³ to P = 1.2 × 10⁻¹³). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.

    Funded by: British Heart Foundation; Chief Scientist Office: CZB/4/710; Department of Health; Intramural NIH HHS: Z01 HG000024-14; Medical Research Council: G0000934, G0401527, G0500115, G0501184, G0600331, G0600705, G0601261, G0701863, G0801056, G0801056B, G1000758B, G9521010, MC_QA137934, MC_U106179472, MC_U106188470, MC_U127561128, MC_UP_A390_1107; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262, R01 CA067262-14, U01 CA049449, U01 CA049449-21, U01 CA098233, U01 CA098233-08, ­U01-CA098233; NCRR NIH HHS: UL1 RR025005, UL1 RR025005-04, UL1-RR025005, ­UL1-RR025005; NHGRI NIH HHS: HG002651, HG005581, N01 HG065403, N01-HG-65403, R01 HG002651, R01 HG002651-05, RC2 HG005581, RC2 HG005581-02, T32 HG000040, T32 HG000040-14, U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, ­T32-HG00040, ­U01-HG004399, ­U01-HG004402; NHLBI NIH HHS: HL043851, HL084729, HL71981, K99 HL094535, K99 HL094535-02, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085080, N01 HC085081, N01 HC085082, N01 HC085083, N01 HC085084, N01 HC085085, N01 HC085086, N01-HC-55018, N01-HC55222, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981, R01 HL071981-07, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087647, R01 HL087647-03, R01 HL087652, R01 HL087652-03, R01 HL087679-03, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01-HL087647, R01-HL59367, U01 HL054527, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084729, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL72515, ­K99HL094535, ­N01-HC-25195, ­N01-HC-55019, ­N01-HC-55020, ­N01-HC-55021, ­N01-HC-55022, ­N01-HC15103, ­N01-HC35129, ­N01-HC45133, ­N01-HC55015, ­N01-HC55016, ­N01-HC75150, ­N01-HC85079, ­N01-HC85080, ­N01-HC85081, ­N01-HC85082, ­N01-HC85083, ­N01-HC85084, ­N01-HC85085, ­N01-HC85086, ­R01-HL086694, ­R01-HL087641, ­R01-HL087679, ­R01-HL087700, ­R01-HL088119, ­R01­HL087652, ­U01-HL084756; NIA NIH HHS: N01 AG012100, N01 AG012109, N01-AG-1-2109, R01 AG031890, R01 AG031890-02, ­N01-AG-12100, ­R01-AG031890; NIDDK NIH HHS: DK062370, DK072193, DK075787, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30 DK046200, P30 DK046200-14, P30 DK072488, P30 DK072488-06, P60 DK020541, R01 DK056690, R01 DK058845, R01 DK058845-11, R01 DK068336, R01 DK068336-03, R01 DK072193, R01 DK072193-05, R01 DK073490, R01 DK073490-05, R01 DK075681, R01 DK075681-04, R01 DK075787, R01 DK075787-05, R01 DK089256, R01-DK068336, R01-DK075787, T32 DK007191, U01 DK062370, U01 DK062370-08, U01 DK062418, U01 DK062418-06, ­K23-DK080145, ­P30-DK072488, ­R01-DK-073490, ­R01-DK075681, ­R01-DK075787, ­U01-DK062418; NIGMS NIH HHS: U01 GM074518, U01 GM074518-05, ­U01-GM074518; NIMH NIH HHS: R01 MH063706-05, R01 MH084698, R01 MH084698-03, RL1 MH083268, RL1 MH083268-05, ­1RL1-MH083268-01, ­MH084698, ­R01-MH63706; PHS HHS: ­263-MA-410953; Wellcome Trust: 064890, 068545, 072960, 075491, 076113, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 081682, 083270, 085235, 085301, 086596, 088885, 089061, 090532, 091746, ­068545/Z/02, ­072960, ­076113/B/04/Z, ­091746/Z/10/Z, ­WT086596/Z/08/Z

    Nature genetics 2010;42;11;949-60

  • A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk.

    Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, Lu H, Li Y, Sarwar R, Langley SR, Bauerfeind A, Hummel O, Lee YA, Paskas S, Rintisch C, Saar K, Cooper J, Buchan R, Gray EE, Cyster JG, Cardiogenics Consortium, Erdmann J, Hengstenberg C, Maouche S, Ouwehand WH, Rice CM, Samani NJ, Schunkert H, Goodall AH, Schulz H, Roider HG, Vingron M, Blankenberg S, Münzel T, Zeller T, Szymczak S, Ziegler A, Tiret L, Smyth DJ, Pravenec M, Aitman TJ, Cambien F, Clayton D, Todd JA, Hubner N and Cook SA

    Max-Delbrück-Center for Molecular Medicine (MDC), Berlin, Germany.

    Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases that may not be apparent from genome-wide association studies alone. Recent advances in rat genomics are facilitating systems-genetics approaches. Here we report the use of integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)-driven inflammatory network (IDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and which was regulated in multiple tissues by a locus on rat chromosome 15q25. We show that Epstein-Barr virus induced gene 2 (Ebi2, also known as Gpr183), which lies at this locus and controls B lymphocyte migration, is expressed in macrophages and regulates the IDIN. The human orthologous locus on chromosome 13q32 controlled the human equivalent of the IDIN, which was conserved in monocytes. IDIN genes were more likely to associate with susceptibility to type 1 diabetes (T1D)-a macrophage-associated autoimmune disease-than randomly selected immune response genes (P = 8.85 × 10(-6)). The human locus controlling the IDIN was associated with the risk of T1D at single nucleotide polymorphism rs9585056 (P = 7.0 × 10(-10); odds ratio, 1.15), which was one of five single nucleotide polymorphisms in this region associated with EBI2 (GPR183) expression. These data implicate IRF7 network genes and their regulatory locus in the pathogenesis of T1D.

    Funded by: British Heart Foundation: P301/10/0290; Medical Research Council: MC_U120061454, MC_U120085815, MC_U120097112; Wellcome Trust: 061858, 076113, 089989

    Nature 2010;467;7314;460-4

  • Genome sequence of a recently emerged, highly transmissible, multi-antibiotic- and antiseptic-resistant variant of methicillin-resistant Staphylococcus aureus, sequence type 239 (TW).

    Holden MT, Lindsay JA, Corton C, Quail MA, Cockfield JD, Pathak S, Batra R, Parkhill J, Bentley SD and Edgeworth JD

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom. mh3@sanger.ac.uk

    The 3.1-Mb genome of an outbreak methicillin-resistant Staphylococcus aureus (MRSA) strain (TW20) contains evidence of recently acquired DNA, including two large regions (635 kb and 127 kb). The strain is resistant to a wide range of antibiotics, antiseptics, and heavy metals due to resistance genes encoded on mobile genetic elements and also mutations in housekeeping genes.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;3;888-92

  • Emx2 and early hair cell development in the mouse inner ear.

    Holley M, Rhodes C, Kneebone A, Herde MK, Fleming M and Steel KP

    Department of Biomedical Science, Addison Building, Western Bank, Sheffield S10 2TN, UK. m.c.holley@sheffield.ac.uk

    Emx2 is a homeodomain protein that plays a critical role in inner ear development. Homozygous null mice die at birth with a range of defects in the CNS, renal system and skeleton. The cochlea is shorter than normal with about 60% fewer auditory hair cells. It appears to lack outer hair cells and some supporting cells are either absent or fail to differentiate. Many of the hair cells differentiate in pairs and although their hair bundles develop normally their planar cell polarity is compromised. Measurements of cell polarity suggest that classic planar cell polarity molecules are not directly influenced by Emx2 and that polarity is compromised by developmental defects in the sensory precursor population or by defects in epithelial cues for cell alignment. Planar cell polarity is normal in the vestibular epithelia although polarity reversal across the striola is absent in both the utricular and saccular maculae. In contrast, cochlear hair cell polarity is disorganized. The expression domain for Bmp4 is expanded and Fgfr1 and Prox1 are expressed in fewer cells in the cochlear sensory epithelium of Emx2 null mice. We conclude that Emx2 regulates early developmental events that balance cell proliferation and differentiation in the sensory precursor population.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Developmental biology 2010;340;2;547-56

  • Disease-associated XMRV sequences are consistent with laboratory contamination.

    Hué S, Gray ER, Gall A, Katzourakis A, Tan CP, Houldcroft CJ, McLaren S, Pillay D, Futreal A, Garson JA, Pybus OG, Kellam P and Towers GJ

    MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.

    Background: Xenotropic murine leukaemia viruses (MLV-X) are endogenous gammaretroviruses that infect cells from many species, including humans. Xenotropic murine leukaemia virus-related virus (XMRV) is a retrovirus that has been the subject of intense debate since its detection in samples from humans with prostate cancer (PC) and chronic fatigue syndrome (CFS). Controversy has arisen from the failure of some studies to detect XMRV in PC or CFS patients and from inconsistent detection of XMRV in healthy controls.

    Results: Here we demonstrate that Taqman PCR primers previously described as XMRV-specific can amplify common murine endogenous viral sequences from mouse suggesting that mouse DNA can contaminate patient samples and confound specific XMRV detection. To consider the provenance of XMRV we sequenced XMRV from the cell line 22Rv1, which is infected with an MLV-X that is indistinguishable from patient derived XMRV. Bayesian phylogenies clearly show that XMRV sequences reportedly derived from unlinked patients form a monophyletic clade with interspersed 22Rv1 clones (posterior probability >0.99). The cell line-derived sequences are ancestral to the patient-derived sequences (posterior probability >0.99). Furthermore, pol sequences apparently amplified from PC patient material (VP29 and VP184) are recombinants of XMRV and Moloney MLV (MoMLV) a virus with an envelope that lacks tropism for human cells. Considering the diversity of XMRV we show that the mean pairwise genetic distance among env and pol 22Rv1-derived sequences exceeds that of patient-associated sequences (Wilcoxon rank sum test: p = 0.005 and p < 0.001 for pol and env, respectively). Thus XMRV sequences acquire diversity in a cell line but not in patient samples. These observations are difficult to reconcile with the hypothesis that published XMRV sequences are related by a process of infectious transmission.

    Conclusions: We provide several independent lines of evidence that XMRV detected by sensitive PCR methods in patient samples is the likely result of PCR contamination with mouse DNA and that the described clones of XMRV arose from the tumour cell line 22Rv1, which was probably infected with XMRV during xenografting in mice. We propose that XMRV might not be a genuine human pathogen.

    Funded by: Medical Research Council: G0801172, G0801172(87743), G9721629; Wellcome Trust: 090940, WT076608, WT090940

    Retrovirology 2010;7;1;111

  • Interleukin-8 mediates resistance to antiangiogenic agent sunitinib in renal cell carcinoma.

    Huang D, Ding Y, Zhou M, Rini BI, Petillo D, Qian CN, Kahnoski R, Futreal PA, Furge KA and Teh BT

    Laboratory of Cancer Genetics, Laboratory of Computational Biology, Van Andel Research Institute, Grand Rapids, Michigan 49503, USA.

    The broad spectrum kinase inhibitor sunitinib is a first-line therapy for advanced clear cell renal cell carcinoma (ccRCC), a deadly form of kidney cancer. Unfortunately, most patients develop sunitinib resistance and progressive disease after about 1 year of treatment. In this study, we evaluated the mechanisms of resistance to sunitinib to identify the potential tactics to overcome it. Xenograft models were generated that mimicked clinical resistance to sunitinib. Higher microvessel density was found in sunitinib-resistant tumors, indicating that an escape from antiangiogenesis occurred. Notably, escape coincided with increased secretion of interleukin-8 (IL-8) from tumors into the plasma, and coadministration of an IL-8 neutralizing antibody resensitized tumors to sunitinib treatment. In patients who were refractory to sunitinib treatment, IL-8 expression was elevated in ccRCC tumors, supporting the concept that IL-8 levels might predict clinical response to sunitinib. Our results reveal IL-8 as an important contributor to sunitinib resistance in ccRCC and a candidate therapeutic target to reverse acquired or intrinsic resistance to sunitinib in this malignancy.

    Funded by: Wellcome Trust: 077012/Z/05/Z

    Cancer research 2010;70;3;1063-71

  • Characterising and predicting haploinsufficiency in the human genome.

    Huang N, Lee I, Marcotte EM and Hurles ME

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.

    Funded by: NIGMS NIH HHS: R01 GM067779; Wellcome Trust: 077014/Z/05/Z

    PLoS genetics 2010;6;10;e1001154

  • Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites.

    Hunt P, Martinelli A, Modrzynska K, Borges S, Creasey A, Rodrigues L, Beraldi D, Loewe L, Fawcett R, Kumar S, Thomson M, Trivedi U, Otto TD, Pain A, Blaxter M and Cravo P

    Institute for Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, UK. Paul.Hunt@ed.ac.uk

    Background: Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum.

    Results: A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme.

    Conclusions: This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D019621/1; Medical Research Council: G0400476, G0900740; Wellcome Trust: 082611/Z/07/Z

    BMC genomics 2010;11;499

  • Systematic analysis of human protein complexes identifies chromosome segregation proteins.

    Hutchins JR, Toyoda Y, Hegemann B, Poser I, Hériché JK, Sykora MM, Augsburg M, Hudecz O, Buschhorn BA, Bulkescher J, Conrad C, Comartin D, Schleiffer A, Sarov M, Pozniakovsky A, Slabicki MM, Schloissnig S, Steinmacher I, Leuschner M, Ssykor A, Lawo S, Pelletier L, Stark H, Nasmyth K, Ellenberg J, Durbin R, Buchholz F, Mechtler K, Hyman AA and Peters JM

    Research Institute of Molecular Pathology (IMP), Dr. Bohr-Gasse 7, A-1030 Vienna, Austria.

    Chromosome segregation and cell division are essential, highly ordered processes that depend on numerous protein complexes. Results from recent RNA interference screens indicate that the identity and composition of these protein complexes is incompletely understood. Using gene tagging on bacterial artificial chromosomes, protein localization, and tandem-affinity purification-mass spectrometry, the MitoCheck consortium has analyzed about 100 human protein complexes, many of which had not or had only incompletely been characterized. This work has led to the discovery of previously unknown, evolutionarily conserved subunits of the anaphase-promoting complex and the gamma-tubulin ring complex--large complexes that are essential for spindle assembly and chromosome segregation. The approaches we describe here are generally applicable to high-throughput follow-up analyses of phenotypic screens in mammalian cells.

    Funded by: Austrian Science Fund FWF: F 3407-B03

    Science (New York, N.Y.) 2010;328;5978;593-9

  • Epilepsy and mental retardation limited to females with PCDH19 mutations can present de novo or in single generation families.

    Hynes K, Tarpey P, Dibbens LM, Bayly MA, Berkovic SF, Smith R, Raisi ZA, Turner SJ, Brown NJ, Desai TD, Haan E, Turner G, Christodoulou J, Leonard H, Gill D, Stratton MR, Gecz J and Scheffer IE

    SA Pathology, Women's and Children's Hospital, 72 King William Road, North Adelaide, SA 5006, Australia.

    Background: Epilepsy and mental retardation limited to females (EFMR) is an intriguing X-linked disorder affecting heterozygous females and sparing hemizygous males. Mutations in the protocadherin 19 (PCDH19) gene have been identified in seven unrelated families with EFMR.

    Methods and results: Here, we assessed the frequency of PCDH19 mutations in individuals with clinical features which overlap those of EFMR. We analysed 185 females from three cohorts: 42 with Rett syndrome who were negative for MECP2 and CDKL5 mutations, 57 with autism spectrum disorders, and 86 with epilepsy with or without intellectual disability. No mutations were identified in the Rett syndrome and autism spectrum disorders cohorts suggesting that despite sharing similar clinical characteristics with EFMR, PCDH19 mutations are not generally associated with these disorders. Among the 86 females with epilepsy (of whom 51 had seizure onset before 3 years), with or without intellectual disability, we identified two (2.3%) missense changes. One (c.1671C-->G, p.N557K), reported previously without clinical data, was found in two affected sisters, the first EFMR family without a multigenerational family history of affected females. The second, reported here, is a novel de novo missense change identified in a sporadic female. The change, p.S276P, is predicted to result in functional disturbance of PCDH19 as it affects a highly conserved residue adjacent to the adhesion interface of EC3 of PCDH19.

    Conclusions: This de novo PCDH19 mutation in a sporadic female highlights that mutational analysis should be considered in isolated instances of girls with infantile onset seizures and developmental delay, in addition to those with the characteristic family history of EFMR.

    Funded by: Wellcome Trust

    Journal of medical genetics 2010;47;3;211-6

  • Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo.

    Ikram MK, Sim X, Xueling S, Jensen RA, Cotch MF, Hewitt AW, Ikram MA, Wang JJ, Klein R, Klein BE, Breteler MM, Cheung N, Liew G, Mitchell P, Uitterlinden AG, Rivadeneira F, Hofman A, de Jong PT, van Duijn CM, Kao L, Cheng CY, Smith AV, Glazer NL, Lumley T, McKnight B, Psaty BM, Jonasson F, Eiriksdottir G, Aspelund T, Global BPgen Consortium, Harris TB, Launer LJ, Taylor KD, Li X, Iyengar SK, Xi Q, Sivakumaran TA, Mackey DA, Macgregor S, Martin NG, Young TL, Bis JC, Wiggins KL, Heckbert SR, Hammond CJ, Andrew T, Fahy S, Attia J, Holliday EG, Scott RJ, Islam FM, Rotter JI, McAuley AK, Boerwinkle E, Tai ES, Gudnason V, Siscovick DS, Vingerling JR and Wong TY

    Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands.

    There is increasing evidence that the microcirculation plays an important role in the pathogenesis of cardiovascular diseases. Changes in retinal vascular caliber reflect early microvascular disease and predict incident cardiovascular events. We performed a genome-wide association study to identify genetic variants associated with retinal vascular caliber. We analyzed data from four population-based discovery cohorts with 15,358 unrelated Caucasian individuals, who are members of the Cohort for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, and replicated findings in four independent Caucasian cohorts (n  =  6,652). All participants had retinal photography and retinal arteriolar and venular caliber measured from computer software. In the discovery cohorts, 179 single nucleotide polymorphisms (SNP) spread across five loci were significantly associated (p<5.0×10(-8)) with retinal venular caliber, but none showed association with arteriolar caliber. Collectively, these five loci explain 1.0%-3.2% of the variation in retinal venular caliber. Four out of these five loci were confirmed in independent replication samples. In the combined analyses, the top SNPs at each locus were: rs2287921 (19q13; p  =  1.61×10(-25), within the RASIP1 locus), rs225717 (6q24; p = 1.25×10(-16), adjacent to the VTA1 and NMBR loci), rs10774625 (12q24; p  =  2.15×10(-13), in the region of ATXN2,SH2B3 and PTPN11 loci), and rs17421627 (5q14; p = 7.32×10(-16), adjacent to the MEF2C locus). In two independent samples, locus 12q24 was also associated with coronary heart disease and hypertension. Our population-based genome-wide association study demonstrates four novel loci associated with retinal venular caliber, an endophenotype of the microcirculation associated with clinical cardiovascular disease. These data provide further insights into the contribution and biological mechanisms of microcirculatory changes that underlie cardiovascular disease.

    Funded by: Medical Research Council: G0401527, G0701863, G0801056, MC_U105630924, MC_UP_A100_1003; NCRR NIH HHS: M01RR00069, UL1RR025005; NEI NIH HHS: R01 EY018246, Z01 EY000401-06, Z01 EY000401-07, Z01 EY000426-04, Z01 EY000426-05, Z01EY000401, Z01EY000426, Z99 EY999999, ZIA EY000401-08, ZIA EY000401-09, ZIA EY000401-10, ZIA EY000403-09, ZIA EY000403-10, ZIA EY000426-06, ZIA EY000426-07, ZIA EY000426-08; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01HC-55222, R01 HL087652, R01HL087641, T32HL007902, U01 HL080295; NIA NIH HHS: N01-AG-12100, Z01AG007380; NIDDK NIH HHS: DK063491

    PLoS genetics 2010;6;10;e1001184

  • A genome-wide perspective of genetic variation in human metabolism.

    Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R, Prehn C, Altmaier E, Kastenmüller G, Kato BS, Mewes HW, Meitinger T, de Angelis MH, Kronenberg F, Soranzo N, Wichmann HE, Spector TD, Adamski J and Suhre K

    Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.

    Serum metabolite concentrations provide a direct readout of biological processes in the human body, and they are associated with disorders such as cardiovascular and metabolic diseases. We present a genome-wide association study (GWAS) of 163 metabolic traits measured in human blood from 1,809 participants from the KORA population, with replication in 422 participants of the TwinsUK cohort. For eight out of nine replicated loci (FADS1, ELOVL2, ACADS, ACADM, ACADL, SPTLC3, ETFDH and SLC16A9), the genetic variant is located in or near genes encoding enzymes or solute carriers whose functions match the associating metabolic traits. In our study, the use of metabolite concentration ratios as proxies for enzymatic reaction rates reduced the variance and yielded robust statistical associations with P values ranging from 3 x 10(-24) to 6.5 x 10(-179). These loci explained 5.6%-36.3% of the observed variance in metabolite concentrations. For several loci, associations with clinically relevant parameters have been reported previously.

    Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust: 091746

    Nature genetics 2010;42;2;137-41

  • Orphan CpG islands identify numerous conserved promoters in the mammalian genome.

    Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr AR, James KD, Turner DJ, Smith C, Harrison DJ, Andrews R and Bird AP

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom.

    CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are "orphans" that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development.

    Funded by: Medical Research Council: G0800026, G0900627; Wellcome Trust: 077224

    PLoS genetics 2010;6;9;e1001134

  • Detailed physiologic characterization reveals diverse mechanisms for novel genetic Loci regulating glucose and insulin metabolism in humans.

    Ingelsson E, Langenberg C, Hivert MF, Prokopenko I, Lyssenko V, Dupuis J, Mägi R, Sharp S, Jackson AU, Assimes TL, Shrader P, Knowles JW, Zethelius B, Abbasi FA, Bergman RN, Bergmann A, Berne C, Boehnke M, Bonnycastle LL, Bornstein SR, Buchanan TA, Bumpstead SJ, Böttcher Y, Chines P, Collins FS, Cooper CC, Dennison EM, Erdos MR, Ferrannini E, Fox CS, Graessler J, Hao K, Isomaa B, Jameson KA, Kovacs P, Kuusisto J, Laakso M, Ladenvall C, Mohlke KL, Morken MA, Narisu N, Nathan DM, Pascoe L, Payne F, Petrie JR, Sayer AA, Schwarz PE, Scott LJ, Stringham HM, Stumvoll M, Swift AJ, Syvänen AC, Tuomi T, Tuomilehto J, Tönjes A, Valle TT, Williams GH, Lind L, Barroso I, Quertermous T, Walker M, Wareham NJ, Meigs JB, McCarthy MI, Groop L, Watanabe RM, Florez JC and MAGIC investigators

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. erik.ingelsson@ki.se

    OBJECTIVE Recent genome-wide association studies have revealed loci associated with glucose and insulin-related traits. We aimed to characterize 19 such loci using detailed measures of insulin processing, secretion, and sensitivity to help elucidate their role in regulation of glucose control, insulin secretion and/or action. RESEARCH DESIGN AND METHODS We investigated associations of loci identified by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) with circulating proinsulin, measures of insulin secretion and sensitivity from oral glucose tolerance tests (OGTTs), euglycemic clamps, insulin suppression tests, or frequently sampled intravenous glucose tolerance tests in nondiabetic humans (n = 29,084). RESULTS The glucose-raising allele in MADD was associated with abnormal insulin processing (a dramatic effect on higher proinsulin levels, but no association with insulinogenic index) at extremely persuasive levels of statistical significance (P = 2.1 x 10(-71)). Defects in insulin processing and insulin secretion were seen in glucose-raising allele carriers at TCF7L2, SCL30A8, GIPR, and C2CD4B. Abnormalities in early insulin secretion were suggested in glucose-raising allele carriers at MTNR1B, GCK, FADS1, DGKB, and PROX1 (lower insulinogenic index; no association with proinsulin or insulin sensitivity). Two loci previously associated with fasting insulin (GCKR and IGF1) were associated with OGTT-derived insulin sensitivity indices in a consistent direction. CONCLUSIONS Genetic loci identified through their effect on hyperglycemia and/or hyperinsulinemia demonstrate considerable heterogeneity in associations with measures of insulin processing, secretion, and sensitivity. Our findings emphasize the importance of detailed physiological characterization of such loci for improved understanding of pathways associated with alterations in glucose homeostasis and eventually type 2 diabetes.

    Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U147574213, MC_U147574239, MC_UP_A620_1014, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIDDK NIH HHS: R01 DK029867

    Diabetes 2010;59;5;1266-75

  • An immune response network associated with blood lipid levels.

    Inouye M, Silander K, Hamalainen E, Salomaa V, Harald K, Jousilahti P, Männistö S, Eriksson JG, Saarela J, Ripatti S, Perola M, van Ommen GJ, Taskinen MR, Palotie A, Dermitzakis ET and Peltonen L

    Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. inouye@wehi.edu.au

    While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.

    Funded by: Wellcome Trust: WT089061, WT089062

    PLoS genetics 2010;6;9;e1001113

  • International network of cancer genome projects.

    International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, Watanabe K, Yang H, Yuen MM, Knoppers BM, Bobrow M, Cambon-Thomsen A, Dressler LG, Dyke SO, Joly Y, Kato K, Kennedy KL, Nicolás P, Parker MJ, Rial-Sebbag E, Romeo-Casabona CM, Shaw KM, Wallace S, Wiesner GL, Zeps N, Lichter P, Biankin AV, Chabannon C, Chin L, Clément B, de Alava E, Degos F, Ferguson ML, Geary P, Hayes DN, Hudson TJ, Johns AL, Kasprzyk A, Nakagawa H, Penny R, Piris MA, Sarin R, Scarpa A, Shibata T, van de Vijver M, Futreal PA, Aburatani H, Bayés M, Botwell DD, Campbell PJ, Estivill X, Gerhard DS, Grimmond SM, Gut I, Hirst M, López-Otín C, Majumder P, Marra M, McPherson JD, Nakagawa H, Ning Z, Puente XS, Ruan Y, Shibata T, Stratton MR, Stunnenberg HG, Swerdlow H, Velculescu VE, Wilson RK, Xue HH, Yang L, Spellman PT, Bader GD, Boutros PC, Campbell PJ, Flicek P, Getz G, Guigó R, Guo G, Haussler D, Heath S, Hubbard TJ, Jiang T, Jones SM, Li Q, López-Bigas N, Luo R, Muthuswamy L, Ouellette BF, Pearson JV, Puente XS, Quesada V, Raphael BJ, Sander C, Shibata T, Speed TP, Stein LD, Stuart JM, Teague JW, Totoki Y, Tsunoda T, Valencia A, Wheeler DA, Wu H, Zhao S, Zhou G, Stein LD, Guigó R, Hubbard TJ, Joly Y, Jones SM, Kasprzyk A, Lathrop M, López-Bigas N, Ouellette BF, Spellman PT, Teague JW, Thomas G, Valencia A, Yoshida T, Kennedy KL, Axton M, Dyke SO, Futreal PA, Gerhard DS, Gunter C, Guyer M, Hudson TJ, McPherson JD, Miller LJ, Ozenberger B, Shaw KM, Kasprzyk A, Stein LD, Zhang J, Haider SA, Wang J, Yung CK, Cros A, Cross A, Liang Y, Gnaneshan S, Guberman J, Hsu J, Bobrow M, Chalmers DR, Hasel KW, Joly Y, Kaan TS, Kennedy KL, Knoppers BM, Lowrance WW, Masui T, Nicolás P, Rial-Sebbag E, Rodriguez LL, Vergely C, Yoshida T, Grimmond SM, Biankin AV, Bowtell DD, Cloonan N, deFazio A, Eshleman JR, Etemadmoghadam D, Gardiner BB, Gardiner BA, Kench JG, Scarpa A, Sutherland RL, Tempero MA, Waddell NJ, Wilson PJ, McPherson JD, Gallinger S, Tsao MS, Shaw PA, Petersen GM, Mukhopadhyay D, Chin L, DePinho RA, Thayer S, Muthuswamy L, Shazand K, Beck T, Sam M, Timms L, Ballin V, Lu Y, Ji J, Zhang X, Chen F, Hu X, Zhou G, Yang Q, Tian G, Zhang L, Xing X, Li X, Zhu Z, Yu Y, Yu J, Yang H, Lathrop M, Tost J, Brennan P, Holcatova I, Zaridze D, Brazma A, Egevard L, Prokhortchouk E, Banks RE, Uhlén M, Cambon-Thomsen A, Viksna J, Ponten F, Skryabin K, Stratton MR, Futreal PA, Birney E, Borg A, Børresen-Dale AL, Caldas C, Foekens JA, Martin S, Reis-Filho JS, Richardson AL, Sotiriou C, Stunnenberg HG, Thoms G, van de Vijver M, van't Veer L, Calvo F, Birnbaum D, Blanche H, Boucher P, Boyault S, Chabannon C, Gut I, Masson-Jacquemier JD, Lathrop M, Pauporté I, Pivot X, Vincent-Salomon A, Tabone E, Theillet C, Thomas G, Tost J, Treilleux I, Calvo F, Bioulac-Sage P, Clément B, Decaens T, Degos F, Franco D, Gut I, Gut M, Heath S, Lathrop M, Samuel D, Thomas G, Zucman-Rossi J, Lichter P, Eils R, Brors B, Korbel JO, Korshunov A, Landgraf P, Lehrach H, Pfister S, Radlwimmer B, Reifenberger G, Taylor MD, von Kalle C, Majumder PP, Sarin R, Rao TS, Bhan MK, Scarpa A, Pederzoli P, Lawlor RA, Delledonne M, Bardelli A, Biankin AV, Grimmond SM, Gress T, Klimstra D, Zamboni G, Shibata T, Nakamura Y, Nakagawa H, Kusada J, Tsunoda T, Miyano S, Aburatani H, Kato K, Fujimoto A, Yoshida T, Campo E, López-Otín C, Estivill X, Guigó R, de Sanjosé S, Piris MA, Montserrat E, González-Díaz M, Puente XS, Jares P, Valencia A, Himmelbauer H, Himmelbaue H, Quesada V, Bea S, Stratton MR, Futreal PA, Campbell PJ, Vincent-Salomon A, Richardson AL, Reis-Filho JS, van de Vijver M, Thomas G, Masson-Jacquemier JD, Aparicio S, Borg A, Børresen-Dale AL, Caldas C, Foekens JA, Stunnenberg HG, van't Veer L, Easton DF, Spellman PT, Martin S, Barker AD, Chin L, Collins FS, Compton CC, Ferguson ML, Gerhard DS, Getz G, Gunter C, Guttmacher A, Guyer M, Hayes DN, Lander ES, Ozenberger B, Penny R, Peterson J, Sander C, Shaw KM, Speed TP, Spellman PT, Vockley JG, Wheeler DA, Wilson RK, Hudson TJ, Chin L, Knoppers BM, Lander ES, Lichter P, Stein LD, Stratton MR, Anderson W, Barker AD, Bell C, Bobrow M, Burke W, Collins FS, Compton CC, DePinho RA, Easton DF, Futreal PA, Gerhard DS, Green AR, Guyer M, Hamilton SR, Hubbard TJ, Kallioniemi OP, Kennedy KL, Ley TJ, Liu ET, Lu Y, Majumder P, Marra M, Ozenberger B, Peterson J, Schafer AJ, Spellman PT, Stunnenberg HG, Wainwright BJ, Wilson RK and Yang H

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

    Funded by: Cancer Research UK: 6613; NCI NIH HHS: P01 CA117969, P01 CA117969-04S1, P01 CA117969-05, P50 CA102701, P50 CA102701-08, P50 CA127003, P50 CA127003-04, P50 CA127003-05; NHGRI NIH HHS: R01 HG001806-02; NIDDK NIH HHS: K08 DK071329, K08 DK071329-04, K08 DK071329-05; Wellcome Trust: 077198, 088340, 093867

    Nature 2010;464;7291;993-8

  • Integrating common and rare genetic variation in diverse human populations.

    International HapMap 3 Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD and McEwen JE

    Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02138, USA. altshuler@molbio.mgh.harvard.edu

    Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

    Funded by: Medical Research Council: G0000934; NHGRI NIH HHS: U54 HG003273; NIDDK NIH HHS: P30 DK043351; Wellcome Trust: 068545, 068545/Z/02, 076113, 077011, 077014, 082371, 089061, 089062, 091746

    Nature 2010;467;7311;52-8

  • Failure to validate association between 12p13 variants and ischemic stroke.

    International Stroke Genetics Consortium and Wellcome Trust Case-Control Consortium 2

    Funded by: Medical Research Council: G0000934, G0701075, G0800509, G0801418B; NCI NIH HHS: CA 047988; NCRR NIH HHS: M01 RR 165001, M01 RR07122, R54 RR020278; NHGRI NIH HHS: U01 HG004436; NHLBI NIH HHS: HL 043851, HL69757, R01 HL087676, R25 HL088724; NIDDK NIH HHS: P30 DK072488; NINDS NIH HHS: 1R01 NS059727, K08 NS045802, NS056302, NS30678, NS34447, NS36695, R01 NS 42733, R01 NS059727, R01 NS059727-01A1, R01 NS45012, R21NS064908; PHS HHS: P60 12583; Wellcome Trust: 068545/Z/02

    The New England journal of medicine 2010;362;16;1547-50

  • The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis.

    Jackson AP, Sanders M, Berry A, McQuillan J, Aslett MA, Quail MA, Chukualim B, Capewell P, MacLeod A, Melville SE, Gibson W, Barry JD, Berriman M and Hertz-Fowler C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Background: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans.

    Methods and findings: An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei.

    Conclusions: This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.

    Funded by: Wellcome Trust: 079703, 095201, WT085775/Z/08/Z

    PLoS neglected tropical diseases 2010;4;4;e658

  • Reverse engineering a gene network using an asynchronous parallel evolution strategy.

    Jostins L and Jaeger J

    Laboratory for Development & Evolution, University Museum of Zoology, Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK.

    Background: The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task.

    Results: Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested.

    Conclusions: Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D000513/1, BB/D00513

    BMC systems biology 2010;4;17

  • Typhoid in Kenya is associated with a dominant multidrug-resistant Salmonella enterica serovar Typhi haplotype that is also widespread in Southeast Asia.

    Kariuki S, Revathi G, Kiiru J, Mengo DM, Mwituria J, Muyodi J, Munyalo A, Teo YY, Holt KE, Kingsley RA and Dougan G

    Centre for Microbiology Research, Kenya Medical Research Institute, P.O. Box 43640-00100, Nairobi, Kenya. skariuki@kemri.org

    In sub-Saharan Africa, the burden of typhoid fever, caused by Salmonella enterica serovar Typhi, remains largely unknown, in part because of a lack of blood or bone marrow culture facilities. We characterized a total of 323 S. Typhi isolates from outbreaks in Kenya over the period 1988 to 2008 for antimicrobial susceptibilities and phylogenetic relationships using single-nucleotide polymorphism (SNP) analysis. There was a dramatic increase in the number and percentage of multidrug-resistant (MDR) S. Typhi isolates over the study period. Overall, only 54 (16.7%) S. Typhi isolates were fully sensitive, while the majority, 195 (60.4%), were multiply resistant to most commonly available drugs-ampicillin, chloramphenicol, tetracycline, and cotrimoxazole; 74 (22.9%) isolates were resistant to a single antimicrobial, usually ampicillin, cotrimoxazole, or tetracycline. Resistance to these antibiotics was encoded on self-transferrable IncHI1 plasmids of the ST6 sequence type. Of the 94 representative S. Typhi isolates selected for genome-wide haplotype analysis, sensitive isolates fell into several phylogenetically different groups, whereas MDR isolates all belonged to a single haplotype, H58, associated with MDR and decreased ciprofloxacin susceptibility, which is also dominant in many parts of Southeast Asia. Derivatives of the same S. Typhi lineage, H58, are responsible for multidrug resistance in Kenya and parts of Southeast Asia, suggesting intercontinental spread of a single MDR clone. Given the emergence of this aggressive MDR haplotype, careful selection and monitoring of antibiotic usage will be required in Kenya, and potentially other regions of sub-Saharan Africa.

    Funded by: Wellcome Trust: 064616/01/Z.

    Journal of clinical microbiology 2010;48;6;2171-6

  • The burden and characteristics of enteric fever at a healthcare facility in a densely populated area of Kathmandu.

    Karkey A, Arjyal A, Anders KL, Boni MF, Dongol S, Koirala S, My PV, Nga TV, Clements AC, Holt KE, Duy PT, Day JN, Campbell JI, Dougan G, Dolecek C, Farrar J, Basnyat B and Baker S

    Oxford University Clinical Research Unit, Patan Academy of Health Sciences, Lagankhel, Kathmandu, Nepal.

    Enteric fever, caused by Salmonella enterica serovars Typhi and Paratyphi A (S. Typhi and S. Paratyphi A) remains a major public health problem in many settings. The disease is limited to locations with poor sanitation which facilitates the transmission of the infecting organisms. Efficacious and inexpensive vaccines are available for S. Typhi, yet are not commonly deployed to control the disease. Lack of vaccination is due partly to uncertainty of the disease burden arising from a paucity of epidemiological information in key locations. We have collected and analyzed data from 3,898 cases of blood culture-confirmed enteric fever from Patan Hospital in Lalitpur Sub-Metropolitan City (LSMC), between June 2005 and May 2009. Demographic data was available for a subset of these patients (n = 527) that were resident in LSMC and who were enrolled in trials. We show a considerable burden of enteric fever caused by S. Typhi (2,672; 68.5%) and S. Paratyphi A (1,226; 31.5%) at this Hospital over a four year period, which correlate with seasonal fluctuations in rainfall. We found that local population density was not related to incidence and we identified a focus of infections in the east of LSMC. With data from patients resident in LSMC we found that the median age of those with S. Typhi (16 years) was significantly less than S. Paratyphi A (20 years) and that males aged 15 to 25 were disproportionately infected. Our findings provide a snapshot into the epidemiological patterns of enteric fever in Kathmandu. The uneven distribution of enteric fever patients within the population suggests local variation in risk factors, such as contaminated drinking water. These findings are important for initiating a vaccination scheme and improvements in sanitation. We suggest any such intervention should be implemented throughout the LSMC area.

    Funded by: Medical Research Council: G0600718; Wellcome Trust

    PloS one 2010;5;11;e13988

  • Mass Spectrometry for Microbial Proteomics: Issues in Data Analysis with Electrophoretic or Mass Spectrometric Expression Proteomic Data

    Karp, N.

    Mass Spectrometry for Microbial Proteomics 2010;Chapter 18;423-40

  • European lactase persistence genotype shows evidence of association with increase in body mass index.

    Kettunen J, Silander K, Saarela O, Amin N, Müller M, Timpson N, Surakka I, Ripatti S, Laitinen J, Hartikainen AL, Pouta A, Lahermo P, Anttila V, Männistö S, Jula A, Virtamo J, Salomaa V, Lehtimäki T, Raitakari O, Gieger C, Wichmann EH, Van Duijn CM, Smith GD, McCarthy MI, Järvelin MR, Perola M and Peltonen L

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. johannes@sanger.ac.uk

    The global prevalence of obesity has increased significantly in recent decades, mainly due to excess calorie intake and increasingly sedentary lifestyle. Here, we test the association between obesity measured by body mass index (BMI) and one of the best-known genetic variants showing strong selective pressure: the functional variant in the cis-regulatory element of the lactase gene. We tested this variant since it is presumed to provide nutritional advantage in specific physical and cultural environments. We genetically defined lactase persistence (LP) in 31 720 individuals from eight European population-based studies and one family study by genotyping or imputing the European LP variant (rs4988235). We performed a meta-analysis by pooling the beta-coefficient estimates of the relationship between rs4988235 and BMI from the nine studies and found that the carriers of the allele responsible for LP among Europeans showed higher BMI (P = 7.9 x 10(-5)). Since this locus has been shown to be prone to population stratification, we paid special attention to reveal any population substructure which might be responsible for the association signal. The best evidence of exclusion of stratification came from the Dutch family sample which is robust for stratification. In this study, we highlight issues in model selection in the genome-wide association studies and problems in imputation of these special genomic regions.

    Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; Medical Research Council: G0600705; NCI NIH HHS: N01-CN-45165; NHLBI NIH HHS: 1-R01-HL087679-01

    Human molecular genetics 2010;19;6;1129-36

  • Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.

    Kim DU, Hayles J, Kim D, Wood V, Park HO, Won M, Yoo HS, Duhig T, Nam M, Palmer G, Han S, Jeffery L, Baek ST, Lee H, Shim YS, Lee M, Kim L, Heo KS, Noh EJ, Lee AR, Jang YJ, Chung KS, Choi SJ, Park JY, Park Y, Kim HM, Park SK, Park HJ, Kang EJ, Kim HB, Kang HS, Park HM, Kim K, Song K, Song KB, Nurse P and Hoe KL

    Integrative Omics Research Centre, Korea Research Institute of Bioscience and Biotechnology, Yuseong, Daejeon, Korea.

    We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome providing a tool for studying eukaryotic biology. Comprehensive gene dispensability comparisons with budding yeast--the only other eukaryote for which a comprehensive knockout library exists--revealed that 83% of single-copy orthologs in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than nonessential genes to be present in a single copy, to be broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth.

    Funded by: Cancer Research UK; Wellcome Trust: 093917

    Nature biotechnology 2010;28;6;617-23

  • Loss of NPC1 function in a patient with a co-inherited novel insulin receptor mutation does not grossly modify the severity of the associated insulin resistance.

    Kirk J, Porter KM, Parker V, Barroso I, O'Rahilly S, Hendriksz C and Semple RK

    Department of Endocrinology, Birmingham Children's Hospital, Steelhouse Lane, Birmingham B4 6NH, United Kingdom.

    In Npc1 null mice, a model for Niemann Pick Disease Type C1, it has been reported that hepatocyte insulin receptor function is significantly impaired, consistent with growing evidence that membrane fluidity and microdomain structure have an important role in insulin signal transduction. However, whether insulin receptor function is also compromised in human Niemann Pick disease Type C1 is unclear. We now report a girl who developed progressive dementia, ataxia and opthalmoplegia from 9 years old, followed by severe acanthosis nigricans, hirsutism and acne at 11 years old. She was diagnosed with Niemann Pick Disease type C1 (OMIM#257220) based on positive filipin staining and reduced cholesterol-esterifying activity in dermal fibroblasts, and homozygosity for the p.Ile1061Thr NPC1 mutation. Further analysis revealed her also to be heterozygous for a novel trinucleotide deletion (c.3659 + 1_3659 + 3delGTG) at the end of exon 20 of INSR, encoding the insulin receptor, leading to deletion of Trp1193 in the intracellular tyrosine kinase domain. INSR mRNA and protein levels were normal in dermal fibroblasts, consistent with a primary signal transduction defect in the mutant receptor. Although the proband was significantly more insulin resistant than her father, who carried the INSR mutation but was only heterozygous for the NPC1 variant, their respective degrees of IR were very similar to those previously reported in a father-daughter pair with the closely related p.Trp1193Leu INSR mutation. This suggests that loss of NPC1 function, with attendant changes in membrane cholesterol composition, does not significantly modify the IR phenotype, even in the context of severely impaired INSR function.

    Funded by: Medical Research Council; Wellcome Trust: 077016, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z

    Journal of inherited metabolic disease 2010;33 Suppl 3;S227-32

  • Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach.

    Klijn C, Bot J, Adams DJ, Reinders M, Wessels L and Jonkers J

    Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

    Tumorigenesis is a multi-step process in which normal cells transform into malignant tumors following the accumulation of genetic mutations that enable them to evade the growth control checkpoints that would normally suppress their growth or result in apoptosis. It is therefore important to identify those combinations of mutations that collaborate in cancer development and progression. DNA copy number alterations (CNAs) are one of the ways in which cancer genes are deregulated in tumor cells. We hypothesized that synergistic interactions between cancer genes might be identified by looking for regions of co-occurring gain and/or loss. To this end we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. The resulting regions of high co-occurrence can be investigated for between-region functional interactions. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively, showing that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. These networks are also highly enriched for functional relationships between genes. We further examine sub-networks of these networks, core networks, which contain many known cancer genes. The core network for co-occurring DNA losses we find seems to be independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low-intensity copy number alterations may be an important feature of cancer development or maintenance by affecting gene dosage of a large interconnected network of functionally related genes.

    PLoS computational biology 2010;6;1;e1000631

  • AnnoTrack--a tracking system for genome annotation.

    Kokocinski F, Harrow J and Hubbard T

    Vertebrate Genome Analysis, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101HH, UK. fsk@sanger.ac.uk

    Background: As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.

    Results: As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.

    Conclusions: AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at http://annotrack.sanger.ac.uk.

    Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z

    BMC genomics 2010;11;538

  • Slingshot: a PiggyBac based transposon system for tamoxifen-inducible 'self-inactivating' insertional mutagenesis.

    Kong J, Wang F, Brenton JD and Adams DJ

    Experimental Cancer Genetics, Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have developed a self-inactivating PiggyBac transposon system for tamoxifen inducible insertional mutagenesis from a stably integrated chromosomal donor. This system, which we have named 'Slingshot', utilizes a transposon carrying elements for both gain- and loss-of-function screens in vitro. We show that the Slingshot transposon can be efficiently mobilized from a range of chromosomal loci with high inducibility and low background generating insertions that are randomly dispersed throughout the genome. Furthermore, we show that once the Slingshot transposon has been mobilized it is not remobilized producing stable clonal integrants in all daughter cells. To illustrate the efficacy of Slingshot as a screening tool we set out to identify mediators of resistance to puromycin and the chemotherapeutic drug vincristine by performing genetrap screens in mouse embryonic stem cells. From these genome-wide screens we identified multiple independent insertions in the multidrug resistance transporter genes Abcb1a/b and Abcg2 conferring resistance to drug treatment. Importantly, we also show that the Slingshot transposon system is functional in other mammalian cell lines such as human HEK293, OVCAR-3 and PE01 cells suggesting that it may be used in a range of cell culture systems. Slingshot represents a flexible and potent system for genome-wide transposon-mediated mutagenesis with many potential applications.

    Funded by: Cancer Research UK; Wellcome Trust

    Nucleic acids research 2010;38;18;e173

  • Insertional mutagenesis in mice deficient for p15Ink4b, p16Ink4a, p21Cip1, and p27Kip1 reveals cancer gene interactions and correlations with tumor phenotypes.

    Kool J, Uren AG, Martins CP, Sie D, de Ridder J, Turner G, van Uitert M, Matentzoglu K, Lagcher W, Krimpenfort P, Gadiot J, Pritchard C, Lenz J, Lund AH, Jonkers J, Rogers J, Adams DJ, Wessels L, Berns A and van Lohuizen M

    Division of Molecular Genetics, The Centre of Biomedical Genetics, Academic Medical Center and Cancer Genomics Centre, Netherlands Cancer Institute, 1066CX, Amsterdam, the Netherlands.

    The cyclin dependent kinase (CDK) inhibitors p15, p16, p21, and p27 are frequently deleted, silenced, or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development, showing that these genes function as tumor suppressors. Here, we describe high-throughput murine leukemia virus insertional mutagenesis screens in mice that are deficient for one or two CDK inhibitors. We retrieved 9,117 retroviral insertions from 476 lymphomas to define hundreds of loci that are mutated more frequently than expected by chance. Many of these loci are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also found associations between these loci with gender, age of tumor onset, and lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with single nucleotide polymorphisms associated with chronic lymphocytic leukemia revealed a significant overlap between the datasets. Together, our findings highlight the importance of genetic context within large-scale mutation detection studies, and they show a novel use for insertional mutagenesis data in prioritizing disease-associated genes that emerge from genome-wide association studies.

    Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356

    Cancer research 2010;70;2;520-31

  • Microindel detection in short-read sequence data.

    Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S and Robinson PN

    Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin. peter.krawitz@googlemail.com

    Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge.

    Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.

    Contact: peter.krawitz@googlemail.com; peter.robinson@charite.de

    Supplementary information: Supplementary data are available at Bioinformatics online.

    Bioinformatics (Oxford, England) 2010;26;6;722-9

  • Hundreds of variants clustered in genomic loci and biological pathways affect human height.

    Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, Winkler TW, Goddard ME, Sin Lo K, Palmer C, Workalemahu T, Aulchenko YS, Johansson A, Zillikens MC, Feitosa MF, Esko T, Johnson T, Ketkar S, Kraft P, Mangino M, Prokopenko I, Absher D, Albrecht E, Ernst F, Glazer NL, Hayward C, Hottenga JJ, Jacobs KB, Knowles JW, Kutalik Z, Monda KL, Polasek O, Preuss M, Rayner NW, Robertson NR, Steinthorsdottir V, Tyrer JP, Voight BF, Wiklund F, Xu J, Zhao JH, Nyholt DR, Pellikka N, Perola M, Perry JR, Surakka I, Tammesoo ML, Altmaier EL, Amin N, Aspelund T, Bhangale T, Boucher G, Chasman DI, Chen C, Coin L, Cooper MN, Dixon AL, Gibson Q, Grundberg E, Hao K, Juhani Junttila M, Kaplan LM, Kettunen J, König IR, Kwan T, Lawrence RW, Levinson DF, Lorentzon M, McKnight B, Morris AP, Müller M, Suh Ngwa J, Purcell S, Rafelt S, Salem RM, Salvi E, Sanna S, Shi J, Sovio U, Thompson JR, Turchin MC, Vandenput L, Verlaan DJ, Vitart V, White CC, Ziegler A, Almgren P, Balmforth AJ, Campbell H, Citterio L, De Grandi A, Dominiczak A, Duan J, Elliott P, Elosua R, Eriksson JG, Freimer NB, Geus EJ, Glorioso N, Haiqing S, Hartikainen AL, Havulinna AS, Hicks AA, Hui J, Igl W, Illig T, Jula A, Kajantie E, Kilpeläinen TO, Koiranen M, Kolcic I, Koskinen S, Kovacs P, Laitinen J, Liu J, Lokki ML, Marusic A, Maschio A, Meitinger T, Mulas A, Paré G, Parker AN, Peden JF, Petersmann A, Pichler I, Pietiläinen KH, Pouta A, Ridderstråle M, Rotter JI, Sambrook JG, Sanders AR, Schmidt CO, Sinisalo J, Smit JH, Stringham HM, Bragi Walters G, Widen E, Wild SH, Willemsen G, Zagato L, Zgaga L, Zitting P, Alavere H, Farrall M, McArdle WL, Nelis M, Peters MJ, Ripatti S, van Meurs JB, Aben KK, Ardlie KG, Beckmann JS, Beilby JP, Bergman RN, Bergmann S, Collins FS, Cusi D, den Heijer M, Eiriksdottir G, Gejman PV, Hall AS, Hamsten A, Huikuri HV, Iribarren C, Kähönen M, Kaprio J, Kathiresan S, Kiemeney L, Kocher T, Launer LJ, Lehtimäki T, Melander O, Mosley TH, Musk AW, Nieminen MS, O'Donnell CJ, Ohlsson C, Oostra B, Palmer LJ, Raitakari O, Ridker PM, Rioux JD, Rissanen A, Rivolta C, Schunkert H, Shuldiner AR, Siscovick DS, Stumvoll M, Tönjes A, Tuomilehto J, van Ommen GJ, Viikari J, Heath AC, Martin NG, Montgomery GW, Province MA, Kayser M, Arnold AM, Atwood LD, Boerwinkle E, Chanock SJ, Deloukas P, Gieger C, Grönberg H, Hall P, Hattersley AT, Hengstenberg C, Hoffman W, Lathrop GM, Salomaa V, Schreiber S, Uda M, Waterworth D, Wright AF, Assimes TL, Barroso I, Hofman A, Mohlke KL, Boomsma DI, Caulfield MJ, Cupples LA, Erdmann J, Fox CS, Gudnason V, Gyllensten U, Harris TB, Hayes RB, Jarvelin MR, Mooser V, Munroe PB, Ouwehand WH, Penninx BW, Pramstaller PP, Quertermous T, Rudan I, Samani NJ, Spector TD, Völzke H, Watkins H, Wilson JF, Groop LC, Haritunians T, Hu FB, Kaplan RC, Metspalu A, North KE, Schlessinger D, Wareham NJ, Hunter DJ, O'Connell JR, Strachan DP, Wichmann HE, Borecki IB, van Duijn CM, Schadt EE, Thorsteinsdottir U, Peltonen L, Uitterlinden AG, Visscher PM, Chatterjee N, Loos RJ, Boehnke M, McCarthy MI, Ingelsson E, Lindgren CM, Abecasis GR, Stefansson K, Frayling TM and Hirschhorn JN

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter EX1 2LU, UK.

    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

    Funded by: British Heart Foundation: PG/02/128, PG/02/128/14470; Cancer Research UK; Chief Scientist Office: CZB/4/276, CZB/4/279, CZB/4/710; Medical Research Council: G0000649, G0000934, G0500539, G0600331, G0600331(77796), G0601261, G0701863, G9521010, G9521010(63660), G9521010D, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262, R01 CA067262-14, R01 CA104021, R01 CA104021-02, U01 CA049449, U01 CA049449-21, U01 CA098233, U01 CA098233-08, U01-CA098233; NCRR NIH HHS: M01-RR00425, U54-RR020278, UL1-RR025005; NHGRI NIH HHS: HG002651, HG005214, HG005581, R01 HG002651, R01 HG002651-05, RC2 HG005581, RC2 HG005581-02, T32-HG00040, U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, U01 HG005214, U01 HG005214-02, U01-HG004399, U01-HG004402, Z01-HG000024; NHLBI NIH HHS: HL043851, HL084729, HL69757, HL71981, K99-HL094535, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC55018, N01-HC55019, N01-HC55020, N01-HC55021, N01-HC55022, N01-HC55222, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, N02-HL-6-4278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981, R01 HL071981-07, R01 HL086694, R01 HL086694-02, R01 HL087641, R01 HL087641-01, R01 HL087647, R01 HL087647-01, R01 HL087652, R01 HL087652-01, R01 HL087676, R01 HL087676-01, R01 HL087679-01, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-01, R01-HL086694, R01-HL087641, R01-HL087647, R01-HL087652, R01-HL087676, R01-HL087679, R01-HL087700, R01-HL088119, R01-HL59367, U01 HL069757, U01 HL069757-10, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084729, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL084756, U01-HL72515; NIA NIH HHS: N01-AG12100, N01-AG12109, R01 AG031890, R01 AG031890-02, R01-AG031890, Z01-AG00675, Z01-AG007380; NIAAA NIH HHS: AA014041, AA07535, AA10248, AA13320, AA13321, AA13326, K05 AA017688, R01 AA007535, R01 AA007535-08, R01 AA013320-04, R01 AA013321, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05; NIAMS NIH HHS: K08 AR055688, K08 AR055688-03, K08 AR055688-04, K08-AR055688; NIDA NIH HHS: DA12854, R01 DA012854, R01 DA012854-09; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK079466, DK080145, DK46200, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30 DK072488, R01 DK058845, R01 DK058845-11, R01 DK068336, R01 DK068336-01, R01 DK072193, R01 DK072193-05, R01 DK073490, R01 DK073490-01, R01 DK075681, R01 DK075681-02, R01 DK075787, R01 DK075787-03, R01 DK089256, R01 DK091718, R01-DK068336, R01-DK073490, R01-DK075681, R01-DK075787, T32 DK007191, U01 DK062370-08, U01 DK062418; NIGMS NIH HHS: U01 GM074518, U01 GM074518-05, U01-GM074518; NIMH NIH HHS: MH084698, R01 MH059160, R01 MH059160-04, R01 MH059565, R01 MH059565-06, R01 MH059566, R01 MH059566-08, R01 MH059571, R01 MH059571-05, R01 MH059586, R01 MH059586-08, R01 MH059587-09, R01 MH059588-08, R01 MH060870-09, R01 MH060879-08, R01 MH061675, R01 MH061675-09, R01 MH067257-04, R01 MH081800, R01 MH081800-01, R01-MH059160, R01-MH59565, R01-MH59566, R01-MH59571, R01-MH59586, R01-MH59587, R01-MH59588, R01-MH60870, R01-MH60879, R01-MH61675, R01-MH63706, R01-MH67257, R01-MH79469, R01-MH81800, RL1 MH083268, RL1 MH083268-05, RL1-MH083268, U01 MH079469, U01 MH079469-03, U01 MH079470, U01 MH079470-03, U01-MH79469, U01-MH79470; PHS HHS: 263-MA-410953, HHSN268200625226C, N01-G65403; Wellcome Trust: 064890, 068545, 068545/Z/02, 072856, 072960, 075491, 076113, 076113/B/04/Z, 076113/C/04/Z, 077016, 077016/Z/05/Z, 079557, 079771, 079895, 081682, 081682/Z/06/Z, 083270, 084183/Z/07/Z, 085301, 085301/Z/08/Z, 086596, 086596/Z/08/Z, 088885, 090532, 091746, 091746/Z/10/Z

    Nature 2010;467;7317;832-8

  • Use of purified Clostridium difficile spores to facilitate evaluation of health care disinfection regimens.

    Lawley TD, Clare S, Deakin LJ, Goulding D, Yen JL, Raisen C, Brandt C, Lovell J, Cooke F, Clark TG and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. tl2@sanger.ac.uk

    Clostridium difficile is a major cause of antibiotic-associated diarrheal disease in many parts of the world. In recent years, distinct genetic variants of C. difficile that cause severe disease and persist within health care settings have emerged. Highly resistant and infectious C. difficile spores are proposed to be the main vectors of environmental persistence and host transmission, so methods to accurately monitor spores and their inactivation are urgently needed. Here we describe simple quantitative methods, based on purified C. difficile spores and a murine transmission model, for evaluating health care disinfection regimens. We demonstrate that disinfectants that contain strong oxidizing active ingredients, such as hydrogen peroxide, are very effective in inactivating pure spores and blocking spore-mediated transmission. Complete inactivation of 10⁶ pure C. difficile spores on indicator strips, a six-log reduction, and a standard measure of stringent disinfection regimens require at least 5 min of exposure to hydrogen peroxide vapor (HPV; 400 ppm). In contrast, a 1-min treatment with HPV was required to disinfect an environment that was heavily contaminated with C. difficile spores (17 to 29 spores/cm²) and block host transmission. Thus, pure C. difficile spores facilitate practical methods for evaluating the efficacy of C. difficile spore disinfection regimens and bringing scientific acumen to C. difficile infection control.

    Funded by: Medical Research Council: G0901743; Wellcome Trust

    Applied and environmental microbiology 2010;76;20;6895-900

  • CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies.

    Lawrence R, Day-Williams AG, Elliott KS, Morris AP and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    Background: Genome-wide association studies have been successful in finding common variants influencing common traits. However, these associations only account for a fraction of trait heritability. There has been a shift in the field towards studying low frequency and rare variants, which are now widely recognised as putative complex trait determinants. Despite this increasing focus on examining the role of low frequency and rare variants in complex disease susceptibility, there is a lack of user-friendly analytical packages implementing powerful association tests for the analysis of rare variants.

    Results: We have developed two software tools, CCRaVAT (Case-Control Rare Variant Analysis Tool) and QuTie (Quantitative Trait), which enable efficient large-scale analysis of low frequency and rare variants. Both programs implement a collapsing method examining the accumulation of low frequency and rare variants across a locus of interest that has more power than single variant analysis. CCRaVAT carries out case-control analyses whereas QuTie has been developed for continuous trait analysis.

    Conclusions: CCRaVAT and QuTie are easy to use software tools that allow users to perform genome-wide association analysis on low frequency and rare variants for both binary and quantitative traits. The software is freely available and provides the genetics community with a resource to perform association analysis on rarer genetic variants.

    Funded by: Wellcome Trust: 064890, 079557, 079557MA, 081682, WT088885/Z/09/Z

    BMC bioinformatics 2010;11;527

  • Phylogenetic analysis of gene structure and alternative splicing in alpha-actinins.

    Lek M, MacArthur DG, Yang N and North KN

    Institute for Neuroscience and Muscle Research, The Children's Hospital at Westmead, Sydney, NSW, Australia.

    The alpha-actinins are an important family of actin-binding proteins with the ability to cross-link actin filaments when in dimer form. Members of the alpha-actinin family share a domain topology composed of highly conserved actin-binding and EF-hand domains separated by a rod domain composed of spectrin-like repeats. Functional diversity within this family has arisen through exon duplication and the formation of alternate splice isoforms as well as gene duplications during the evolution of vertebrates. In addition to the known functional domains, alpha-actinins also contain a consensus PDZ-binding site. The completed genome sequence of over 32 invertebrate species has allowed the analysis of gene structure and exon-gene duplication over a diverse range of phyla. Our analysis shows that relative to early branching metazoans, there has been considerable intron loss especially in arthropods with few cases of intron gains. The C-terminal PDZ-binding site is conserved in nearly all invertebrates but is missing in some nematodes and platyhelminths. Alternative splicing in the actin-binding domain is conserved in chordates, arthropods, and some nematodes and platyhelminths. In contrast, alternative splicing of the EF-hand domain is only observed in chordates. Finally, given the prevalence of exon duplications seen in the actin-binding domain, this may act as a significant mechanism in the modification of actin-binding properties.

    Molecular biology and evolution 2010;27;4;773-80

  • The genome of a pathogenic rhodococcus: cooptive virulence underpinned by key gene acquisitions.

    Letek M, González P, Macarthur I, Rodríguez H, Freeman TC, Valero-Rello A, Blanco M, Buckley T, Cherevach I, Fahey R, Hapeshi A, Holdstock J, Leadon D, Navas J, Ocampo A, Quail MA, Sanders M, Scortti MM, Prescott JF, Fogarty U, Meijer WG, Parkhill J, Bentley SD and Vázquez-Boland JA

    Microbial Pathogenesis Unit, Centres for Infectious Diseases and Immunity, Infection, and Evolution, University of Edinburgh, Edinburgh, United Kingdom.

    We report the genome of the facultative intracellular parasite Rhodococcus equi, the only animal pathogen within the biotechnologically important actinobacterial genus Rhodococcus. The 5.0-Mb R. equi 103S genome is significantly smaller than those of environmental rhodococci. This is due to genome expansion in nonpathogenic species, via a linear gain of paralogous genes and an accelerated genetic flux, rather than reductive evolution in R. equi. The 103S genome lacks the extensive catabolic and secondary metabolic complement of environmental rhodococci, and it displays unique adaptations for host colonization and competition in the short-chain fatty acid-rich intestine and manure of herbivores--two main R. equi reservoirs. Except for a few horizontally acquired (HGT) pathogenicity loci, including a cytoadhesive pilus determinant (rpl) and the virulence plasmid vap pathogenicity island (PAI) required for intramacrophage survival, most of the potential virulence-associated genes identified in R. equi are conserved in environmental rhodococci or have homologs in nonpathogenic Actinobacteria. This suggests a mechanism of virulence evolution based on the cooption of existing core actinobacterial traits, triggered by key host niche-adaptive HGT events. We tested this hypothesis by investigating R. equi virulence plasmid-chromosome crosstalk, by global transcription profiling and expression network analysis. Two chromosomal genes conserved in environmental rhodococci, encoding putative chorismate mutase and anthranilate synthase enzymes involved in aromatic amino acid biosynthesis, were strongly coregulated with vap PAI virulence genes and required for optimal proliferation in macrophages. The regulatory integration of chromosomal metabolic genes under the control of the HGT-acquired plasmid PAI is thus an important element in the cooptive virulence of R. equi.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F003722/1

    PLoS genetics 2010;6;9;e1001145

  • MicroRNAs in mouse development and disease.

    Lewis MA and Steel KP

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

    MicroRNAs, small non-coding RNAs which act as repressors of target genes, were discovered in 1993, and since then have been shown to play important roles in the development of numerous systems. Consistent with this role, they are also implicated in the pathogenesis of multiple diseases. Here we review the involvement of microRNAs in mouse development and disease, with particular reference to deafness as an example.

    Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust

    Seminars in cell & developmental biology 2010;21;7;774-80

  • Reprogramming of T cells to natural killer-like cells upon Bcl11b deletion.

    Li P, Burke S, Wang J, Chen X, Ortiz M, Lee SC, Lu D, Campos L, Goulding D, Ng BL, Dougan G, Huntly B, Gottgens B, Jenkins NA, Copeland NG, Colucci F and Liu P

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.

    T cells develop in the thymus and are critical for adaptive immunity. Natural killer (NK) lymphocytes constitute an essential component of the innate immune system in tumor surveillance, reproduction, and defense against microbes and viruses. Here, we show that the transcription factor Bcl11b was expressed in all T cell compartments and was indispensable for T lineage development. When Bcl11b was deleted, T cells from all developmental stages acquired NK cell properties and concomitantly lost or decreased T cell-associated gene expression. These induced T-to-natural killer (ITNK) cells, which were morphologically and genetically similar to conventional NK cells, killed tumor cells in vitro, and effectively prevented tumor metastasis in vivo. Therefore, ITNKs may represent a new cell source for cell-based therapies.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0501150, G0800784, G116/187; Wellcome Trust: 076962, 077186

    Science (New York, N.Y.) 2010;329;5987;85-9

  • Meta-analysis and imputation refines the association of 15q25 with smoking quantity.

    Liu JZ, Tozzi F, Waterworth DM, Pillai SG, Muglia P, Middleton L, Berrettini W, Knouff CW, Yuan X, Waeber G, Vollenweider P, Preisig M, Wareham NJ, Zhao JH, Loos RJ, Barroso I, Khaw KT, Grundy S, Barter P, Mahley R, Kesaniemi A, McPherson R, Vincent JB, Strauss J, Kennedy JL, Farmer A, McGuffin P, Day R, Matthews K, Bakke P, Gulsvik A, Lucae S, Ising M, Brueckl T, Horstmann S, Wichmann HE, Rawal R, Dahmen N, Lamina C, Polasek O, Zgaga L, Huffman J, Campbell S, Kooner J, Chambers JC, Burnett MS, Devaney JM, Pichard AD, Kent KM, Satler L, Lindsay JM, Waksman R, Epstein S, Wilson JF, Wild SH, Campbell H, Vitart V, Reilly MP, Li M, Qu L, Wilensky R, Matthai W, Hakonarson HH, Rader DJ, Franke A, Wittig M, Schäfer A, Uda M, Terracciano A, Xiao X, Busonero F, Scheet P, Schlessinger D, St Clair D, Rujescu D, Abecasis GR, Grabe HJ, Teumer A, Völzke H, Petersmann A, John U, Rudan I, Hayward C, Wright AF, Kolcic I, Wright BJ, Thompson JR, Balmforth AJ, Hall AS, Samani NJ, Anderson CA, Ahmad T, Mathew CG, Parkes M, Satsangi J, Caulfield M, Munroe PB, Farrall M, Dominiczak A, Worthington J, Thomson W, Eyre S, Barton A, Wellcome Trust Case Control Consortium, Mooser V, Francks C and Marchini J

    Department of Statistics, University of Oxford, Oxford, UK.

    Smoking is a leading global cause of disease and mortality. We established the Oxford-GlaxoSmithKline study (Ox-GSK) to perform a genome-wide meta-analysis of SNP association with smoking-related behavioral traits. Our final data set included 41,150 individuals drawn from 20 disease, population and control cohorts. Our analysis confirmed an effect on smoking quantity at a locus on 15q25 (P = 9.45 x 10(-19)) that includes CHRNA5, CHRNA3 and CHRNB4, three genes encoding neuronal nicotinic acetylcholine receptor subunits. We used data from the 1000 Genomes project to investigate the region using imputation, which allowed for analysis of virtually all common SNPs in the region and offered a fivefold increase in marker density over HapMap2 (ref. 2) as an imputation reference panel. Our fine-mapping approach identified a SNP showing the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.

    Funded by: Chief Scientist Office: CZB/4/540, CZB/4/710, ETM/75; Intramural NIH HHS: Z99 AG999999, ZIA AG000196-03, ZIA AG000196-04; Medical Research Council: G0401527, G0600329, G0701863, G0800759, G9521010, G9817803B, MC_U106179471, MC_U106188470, MC_U127561128

    Nature genetics 2010;42;5;436-40

  • Characterisations of odorant-binding proteins in the tsetse fly Glossina morsitans morsitans.

    Liu R, Lehane S, He X, Lehane M, Hertz-Fowler C, Berriman M, Pickett JA, Field LM and Zhou JJ

    Department of Biological Chemistry, Harpenden, UK.

    Odorant-binding proteins (OBPs) play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. We report for the first time 20 OBP genes in the tsetse fly Glossina morsitans morsitans. qRT-PCR revealed that 8 of these genes were highly transcribed in the antennae. The transcription of these genes in the antennae was significantly lower in males than in females and there was a clear correlation between OBP gene transcription and feeding status. Starvation over 72 h post-blood meal (PBM) did not significantly affect the transcription. However, the transcription in the antennae of 10-week-old flies was much higher than in 3-day-old flies at 48 h PBM and decreased sharply after 72 h starvation, suggesting that the OBP gene expression is affected by the insect's nutritional status. Sequence comparisons with OBPs of other Dipterans identified several homologs to sex pheromone-binding proteins and OBPs of Drosophila melanogaster.

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Cellular and molecular life sciences : CMLS 2010;67;6;919-29

  • Origin of the human malaria parasite Plasmodium falciparum in gorillas.

    Liu W, Li Y, Learn GH, Rudicell RS, Robertson JD, Keele BF, Ndjango JB, Sanz CM, Morgan DB, Locatelli S, Gonder MK, Kranzusch PJ, Walsh PD, Delaporte E, Mpoudi-Ngole E, Georgiev AV, Muller MN, Shaw GM, Peeters M, Sharp PM, Rayner JC and Hahn BH

    Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.

    Plasmodium falciparum is the most prevalent and lethal of the malaria parasites infecting humans, yet the origin and evolutionary history of this important pathogen remain controversial. Here we develop a single-genome amplification strategy to identify and characterize Plasmodium spp. DNA sequences in faecal samples from wild-living apes. Among nearly 3,000 specimens collected from field sites throughout central Africa, we found Plasmodium infection in chimpanzees (Pan troglodytes) and western gorillas (Gorilla gorilla), but not in eastern gorillas (Gorilla beringei) or bonobos (Pan paniscus). Ape plasmodial infections were highly prevalent, widely distributed and almost always made up of mixed parasite species. Analysis of more than 1,100 mitochondrial, apicoplast and nuclear gene sequences from chimpanzees and gorillas revealed that 99% grouped within one of six host-specific lineages representing distinct Plasmodium species within the subgenus Laverania. One of these from western gorillas comprised parasites that were nearly identical to P. falciparum. In phylogenetic analyses of full-length mitochondrial sequences, human P. falciparum formed a monophyletic lineage within the gorilla parasite radiation. These findings indicate that P. falciparum is of gorilla origin and not of chimpanzee, bonobo or ancient human origin.

    Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: P30 AI 7767, P30 AI027767, P30 AI027767-21A1, R01 AI058715, R01 AI058715-06A1, R01 AI058715-07, R01 AI50529, R03 AI074778, R03 AI074778-02, R37 AI050529, R37 AI050529-07, R37 AI050529-08, T32 AI007245, T32 AI007245-26, U19 AI 067854, U19 AI067854, U19 AI067854-06; NIGMS NIH HHS: T32 GM008111, T32 GM008111-13; PHS HHS: R01 I58715; Wellcome Trust

    Nature 2010;467;7314;420-5

  • Single genome amplification and direct amplicon sequencing of Plasmodium spp. DNA from ape fecal specimens.

    Liu W, Li Y, Peeters M, Rayner J, Sharp P, Shaw G and Hahn B

    Department of Medicine, University of Alabama at Birmingham.

    Conventional PCR followed by molecular cloning and sequencing of amplified products is commonly used to test clinical specimens for target sequences of interest, such as viral, bacterial or parasite nucleic acids. However, this approach has serious limitations when used to analyze mixtures of genetically divergent templates(1-9). This is because Taq polymerase is prone to switch templates during the amplification process, thereby generating recombinants that do not exist in vivo (4). When amplicons are cloned prior to sequence analysis, the resulting sequences may also contain a substantial number of Tag-induced substitutions(1-4). Finally, cloning of amplicons can lead to a non-proportional representation of sequences due to the re-sampling of only certain templates(1-4). These confounders can be avoided by using single genome amplification (SGA) followed by direct sequencing of SGA amplicons(1-5). While SGA is not required for many research applications, we have shown it to be essential for deciphering the diversification pathways of human and simian immunodeficiency viruses (HIV/SIV) in acute and chronic infection(4-7), the detection of simian foamy virus (SFVCPZ) super-infection in wild-living chimpanzees(8), and most recently, the molecular identification and characterization of Plasmodium spp. infections in wild-living apes(9). Here, we describe SGA-direct amplicon sequencing of Plasmodium spp. DNA from ape fecal samples.

    Protocol exchange 2010;2010

  • Ten simple rules for editing Wikipedia.

    Logan DW, Sandal M, Gardner PP, Manske M and Bateman A

    PLoS computational biology 2010;6;9

  • Loss-of-function variants in the genomes of healthy humans.

    MacArthur DG and Tyler-Smith C

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. dm8@sanger.ac.uk

    Genetic variants predicted to seriously disrupt the function of human protein-coding genes-so-called loss-of-function (LOF) variants-have traditionally been viewed in the context of severe Mendelian disease. However, recent large-scale sequencing and genotyping projects have revealed a surprisingly large number of these variants in the genomes of apparently healthy individuals--at least 100 per genome, including more than 30 in a homozygous state--suggesting a previously unappreciated level of variation in functional gene content between humans. These variants are mostly found at low frequency, suggesting that they are enriched for mildly deleterious polymorphisms suppressed by negative natural selection, and thus represent an attractive set of candidate variants for complex disease susceptibility. However, they are also enriched for sequencing and annotation artefacts, so overall present serious challenges for clinical sequencing projects seeking to identify severe disease genes amidst the 'noise' of technical error and benign genetic polymorphism. Systematic, high-quality catalogues of LOF variants present in the genomes of healthy individuals, built from the output of large-scale sequencing studies such as the 1000 Genomes Project, will help to distinguish between benign and disease-causing LOF variants, and will provide valuable resources for clinical genomics.

    Funded by: Wellcome Trust

    Human molecular genetics 2010;19;R2;R125-30

  • Dysregulated humoral immunity to nontyphoidal Salmonella in HIV-infected African adults.

    MacLennan CA, Gilchrist JJ, Gordon MA, Cunningham AF, Cobbold M, Goodall M, Kingsley RA, van Oosterhout JJ, Msefula CL, Mandala WL, Leyton DL, Marshall JL, Gondwe EN, Bobat S, López-Macías C, Doffinger R, Henderson IR, Zijlstra EE, Dougan G, Drayson MT, MacLennan IC and Molyneux ME

    Medical Research Council Centre for Immune Regulation and Clinical Immunology Service, Institute of Biomedical Research, School of Immunity and Infection, University of Birmingham, Birmingham, UK. c.maclennan@bham.ac.uk

    Nontyphoidal Salmonellae are a major cause of life-threatening bacteremia among HIV-infected individuals. Although cell-mediated immunity controls intracellular infection, antibodies protect against Salmonella bacteremia. We report that high-titer antibodies specific for Salmonella lipopolysaccharide (LPS) are associated with a lack of Salmonella-killing in HIV-infected African adults. Killing was restored by genetically shortening LPS from the target Salmonella or removing LPS-specific antibodies from serum. Complement-mediated killing of Salmonella by healthy serum is shown to be induced specifically by antibodies against outer membrane proteins. This killing is lost when excess antibody against Salmonella LPS is added. Thus, our study indicates that impaired immunity against nontyphoidal Salmonella bacteremia in HIV infection results from excess inhibitory antibodies against Salmonella LPS, whereas serum killing of Salmonella is induced by antibodies against outer membrane proteins.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    Science (New York, N.Y.) 2010;328;5977;508-12

  • Meeting report: a workshop on Best Practices in Genome Annotation.

    Madupu R, Brinkac LM, Harrow J, Wilming LG, Böhme U, Lamesch P and Hannick LI

    Informatics, J. Craig Venter Institute, Rockville, MD 20850 USA, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK and The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA 94305 USA.

    Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the 'Best Practices in Genome Annotation: Inference from Evidence' workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.

    Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198

    Database : the journal of biological databases and curation 2010;2010;baq001

  • FRT-seq: amplification-free, strand-specific transcriptome sequencing.

    Mamanova L, Andrews RM, James KD, Sheridan EM, Ellis PD, Langford CF, Ost TW, Collins JE and Turner DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We report an alternative approach to transcriptome sequencing for the Illumina Genome Analyzer, in which the reverse transcription reaction takes place on the flowcell. No amplification is performed during the library preparation, so PCR biases and duplicates are avoided, and because the template is poly(A)(+) RNA rather than cDNA, the resulting sequences are necessarily strand-specific. The method is compatible with paired- or single-end sequencing.

    Funded by: Wellcome Trust: 079643, WT079643

    Nature methods 2010;7;2;130-2

  • Target-enrichment strategies for next-generation sequencing.

    Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J and Turner DJ

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.

    Funded by: NHGRI NIH HHS: 5R21HG004749, R21 HG004749; NHLBI NIH HHS: 5R01HL094976, R01 HL094976; Wellcome Trust: WT079643

    Nature methods 2010;7;2;111-8

  • Construction of a large extracellular protein interaction network and its resolution by spatiotemporal expression profiling.

    Martin S, Söllner C, Charoensawan V, Adryan B, Thisse B, Thisse C, Teichmann S and Wright GJ

    Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB101HH, United Kingdom.

    Extracellular interactions involving both secreted and membrane-tethered receptor proteins are essential to initiate signaling pathways that orchestrate cellular behaviors within biological systems. Because of the biochemical properties of these proteins and their interactions, identifying novel extracellular interactions remains experimentally challenging. To address this, we have recently developed an assay, AVEXIS (avidity-based extracellular interaction screen) to detect low affinity extracellular interactions on a large scale and have begun to construct interaction networks between zebrafish receptors belonging to the immunoglobulin and leucine-rich repeat protein families to identify novel signaling pathways important for early development. Here, we expanded our zebrafish protein library to include other domain families and many more secreted proteins and performed our largest screen to date totaling 16,544 potential unique interactions. We report 111 interactions of which 96 are novel and include the first documented extracellular ligands for 15 proteins. By including 77 interactions from previous screens, we assembled an expanded network of 188 extracellular interactions between 92 proteins and used it to show that secreted proteins have twice as many interaction partners as membrane-tethered receptors and that the connectivity of the extracellular network behaves as a power law. To try to understand the functional role of these interactions, we determined new expression patterns for 164 genes within our clone library by using whole embryo in situ hybridization at five key stages of zebrafish embryonic development. These expression data were integrated with the binding network to reveal where each interaction was likely to function within the embryo and were used to resolve the static interaction network into dynamic tissue- and stage-specific subnetworks within the developing zebrafish embryo. All these data were organized into a freely accessible on-line database called ARNIE (AVEXIS Receptor Network with Integrated Expression; www.sanger.ac.uk/arnie) and provide a valuable resource of new extracellular signaling interactions for developmental biology.

    Funded by: Medical Research Council; Wellcome Trust: 077108/Z/05/Z

    Molecular & cellular proteomics : MCP 2010;9;12;2654-65

  • Novel candidate cancer genes identified by a large-scale cross-species comparative oncogenomics approach.

    Mattison J, Kool J, Uren AG, de Ridder J, Wessels L, Jonkers J, Bignell GR, Butler A, Rust AG, Brosch M, Wilson CH, van der Weyden L, Largaespada DA, Stratton MR, Futreal PA, van Lohuizen M, Berns A, Collier LS, Hubbard T and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Comparative genomic hybridization (CGH) can reveal important disease genes but the large regions identified could sometimes contain hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the murine leukemia virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes mutated in both human and mouse tumors, making them strong candidates for novel cancer genes. A significant number of these genes contained binding sites for the stem cell transcription factors Oct4 and Nanog. Notably, mice carrying tumors with insertions in or near stem cell module genes, which are thought to participate in cell self-renewal, died significantly faster than mice without these insertions. A comparison of the profile we identified to that induced with the Sleeping Beauty (SB) transposon system revealed significant differences in the profile of recurrently mutated genes. Collectively, this work provides a rich catalogue of new candidate cancer genes for functional analysis.

    Funded by: Cancer Research UK: A6997, A8784; NCI NIH HHS: K01 CA122183, K01CA122183, R01 CA113636, R01 CA134759; Wellcome Trust: 077198, 082356

    Cancer research 2010;70;3;883-95

  • Regulation of the Epstein-Barr virus Zp promoter in B lymphocytes during reactivation from latency.

    McDonald C, Karstegl CE, Kellam P and Farrell PJ

    Department of Virology, Imperial College Faculty of Medicine, St Mary's Campus, London W2 1PG, UK.

    Ten novel mutations were introduced into the Zp promoter to test the role of sequences outside the established transcription factor-binding sites in Epstein-Barr virus (EBV) reactivation. Most of these had only small effects, but mutations in the ZID site were shown to reduce Zp activity strongly at early times after induction by anti-immunoglobulin (anti-Ig). The binding of MEF2 transcription factor to ZID was characterized in detail and linked functionally to Zp promoter activity. The presence of XBP-1s, the active form of XBP-1, after administration of anti-Ig to Akata Burkitt's lymphoma cells is consistent with a role for this factor in reactivation of the EBV lytic cycle, although signalling through MEF2D was quantitatively much more significant in activation of Zp. Silencing of Zp during latency is thought to be primarily a consequence of a repressive chromatin structure on Zp, and this aspect of Zp regulation can be observed in the Akata genome through protection of Zp from activation by BZLF1 in the absence of signalling from the B-cell receptor.

    The Journal of general virology 2010;91;Pt 3;622-9

  • Visualizing chromosome mosaicism and detecting ethnic outliers by the method of "rare" heterozygotes and homozygotes (RHH).

    McGinnis RE, Deloukas P, McLaren WM and Inouye M

    Wellcome Trust Sanger Institute, Cambridge, UK. rm2@sanger.ac.uk

    We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify "ethnic outlier" subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of "rare" heterozygotes and/or homozygotes whose frequencies are low (<1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.

    Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113

    Human molecular genetics 2010;19;13;2539-53

  • Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

    McLaren W, Pritchard B, Rios D, Chen Y, Flicek P and Cunningham F

    European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. wm2@ebi.ac.uk

    Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species.

    Availability: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2010;26;16;2069-70

  • Genome-wide association studies of serum magnesium, potassium, and sodium concentrations identify six Loci influencing serum magnesium levels.

    Meyer TE, Verwoert GC, Hwang SJ, Glazer NL, Smith AV, van Rooij FJ, Ehret GB, Boerwinkle E, Felix JF, Leak TS, Harris TB, Yang Q, Dehghan A, Aspelund T, Katz R, Homuth G, Kocher T, Rettig R, Ried JS, Gieger C, Prucha H, Pfeufer A, Meitinger T, Coresh J, Hofman A, Sarnak MJ, Chen YD, Uitterlinden AG, Chakravarti A, Psaty BM, van Duijn CM, Kao WH, Witteman JC, Gudnason V, Siscovick DS, Fox CS, Köttgen A, Genetic Factors for Osteoporosis Consortium and Meta Analysis of Glucose and Insulin Related Traits Consortium

    Human Genetics Center and Division of Epidemiology, The University of Texas Health Science Center at Houston, School of Public Health, Houston, Texas, USA.

    Magnesium, potassium, and sodium, cations commonly measured in serum, are involved in many physiological processes including energy metabolism, nerve and muscle function, signal transduction, and fluid and blood pressure regulation. To evaluate the contribution of common genetic variation to normal physiologic variation in serum concentrations of these cations, we conducted genome-wide association studies of serum magnesium, potassium, and sodium concentrations using approximately 2.5 million genotyped and imputed common single nucleotide polymorphisms (SNPs) in 15,366 participants of European descent from the international CHARGE Consortium. Study-specific results were combined using fixed-effects inverse-variance weighted meta-analysis. SNPs demonstrating genome-wide significant (p<5 x 10(-8)) or suggestive associations (p<4 x 10(-7)) were evaluated for replication in an additional 8,463 subjects of European descent. The association of common variants at six genomic regions (in or near MUC1, ATP2B1, DCDC5, TRPM6, SHROOM3, and MDS1) with serum magnesium levels was genome-wide significant when meta-analyzed with the replication dataset. All initially significant SNPs from the CHARGE Consortium showed nominal association with clinically defined hypomagnesemia, two showed association with kidney function, two with bone mineral density, and one of these also associated with fasting glucose levels. Common variants in CNNM2, a magnesium transporter studied only in model systems to date, as well as in CNNM3 and CNNM4, were also associated with magnesium concentrations in this study. We observed no associations with serum sodium or potassium levels exceeding p<4 x 10(-7). Follow-up studies of newly implicated genomic loci may provide additional insights into the regulation and homeostasis of human serum magnesium levels.

    Funded by: NCRR NIH HHS: M01-RR00425, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-6-4278, R01 HL087652, R01HL087641, U01 HL080295; NIA NIH HHS: N01-AG-12100; NIDDK NIH HHS: DK063491

    PLoS genetics 2010;6;8

  • Transcriptome genetics using second generation sequencing in a Caucasian population.

    Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R and Dermitzakis ET

    Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211 Switzerland. stephen.montgomery@unige.ch

    Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.

    Funded by: Wellcome Trust: 077046

    Nature 2010;464;7289;773-7

  • An evaluation of statistical approaches to rare variant analysis in genetic association studies.

    Morris AP and Zeggini E

    Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom. amorris@well.ox.ac.uk

    Genome-wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large-scale re-sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population-based genetic studies when data are available from re-sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re-sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits.

    Funded by: Wellcome Trust: 064890, 081682, WT081682/Z/06/Z, WT088885/Z/09/Z

    Genetic epidemiology 2010;34;2;188-93

  • Evoker: a visualization tool for genotype intensity data.

    Morris JA, Randall JC, Maller JB and Barrett JC

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.

    Summary: Genome-wide association studies (GWAS), which produce huge volumes of data, are now being carried out by many groups around the world, creating a need for user-friendly tools for data quality control (QC) and analysis. One critical aspect of GWAS QC is evaluating genotype cluster plots to verify sensible genotype calling in putatively associated single nucleotide polymorphisms (SNPs). Evoker is a tool for visualizing genotype cluster plots, and provides a solution to the computational and storage problems related to working with such large datasets.

    Availability: http://www.sanger.ac.uk/resources/software/evoker/

    Funded by: Wellcome Trust: 089120, WT08912/Z/09/Z

    Bioinformatics (Oxford, England) 2010;26;14;1786-7

  • Methods for Improving Genome Annotation

    Mudge,J. and Harrow,J.

    Knowledge-Based Bioinformatics: From Analysis to Interpretation 2010;Chapter 9;209-32

  • Interactions of dietary whole-grain intake with fasting glucose- and insulin-related genetic loci in individuals of European descent: a meta-analysis of 14 cohort studies.

    Nettleton JA, McKeown NM, Kanoni S, Lemaitre RN, Hivert MF, Ngwa J, van Rooij FJ, Sonestedt E, Wojczynski MK, Ye Z, Tanaka T, Garcia M, Anderson JS, Follis JL, Djousse L, Mukamal K, Papoutsakis C, Mozaffarian D, Zillikens MC, Bandinelli S, Bennett AJ, Borecki IB, Feitosa MF, Ferrucci L, Forouhi NG, Groves CJ, Hallmans G, Harris T, Hofman A, Houston DK, Hu FB, Johansson I, Kritchevsky SB, Langenberg C, Launer L, Liu Y, Loos RJ, Nalls M, Orho-Melander M, Renstrom F, Rice K, Riserus U, Rolandsson O, Rotter JI, Saylor G, Sijbrands EJ, Sjogren P, Smith A, Steingrímsdóttir L, Uitterlinden AG, Wareham NJ, Prokopenko I, Pankow JS, van Duijn CM, Florez JC, Witteman JC, MAGIC Investigators, Dupuis J, Dedoussis GV, Ordovas JM, Ingelsson E, Cupples L, Siscovick DS, Franks PW and Meigs JB

    Division of Epidemiology, Human Genetics, and Environmental Sciences, University of Texas Health Sciences Center, Houston, Houston, Texas, USA. jennifer.a.nettleton@uth.tmc.edu

    Objective: Whole-grain foods are touted for multiple health benefits, including enhancing insulin sensitivity and reducing type 2 diabetes risk. Recent genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with fasting glucose and insulin concentrations in individuals free of diabetes. We tested the hypothesis that whole-grain food intake and genetic variation interact to influence concentrations of fasting glucose and insulin.

    Research design and methods: Via meta-analysis of data from 14 cohorts comprising ∼ 48,000 participants of European descent, we studied interactions of whole-grain intake with loci previously associated in GWAS with fasting glucose (16 loci) and/or insulin (2 loci) concentrations. For tests of interaction, we considered a P value <0.0028 (0.05 of 18 tests) as statistically significant.

    Results: Greater whole-grain food intake was associated with lower fasting glucose and insulin concentrations independent of demographics, other dietary and lifestyle factors, and BMI (β [95% CI] per 1-serving-greater whole-grain intake: -0.009 mmol/l glucose [-0.013 to -0.005], P < 0.0001 and -0.011 pmol/l [ln] insulin [-0.015 to -0.007], P = 0.0003). No interactions met our multiple testing-adjusted statistical significance threshold. The strongest SNP interaction with whole-grain intake was rs780094 (GCKR) for fasting insulin (P = 0.006), where greater whole-grain intake was associated with a smaller reduction in fasting insulin concentrations in those with the insulin-raising allele.

    Conclusions: Our results support the favorable association of whole-grain intake with fasting glucose and insulin and suggest a potential interaction between variation in GCKR and whole-grain intake in influencing fasting insulin concentrations.

    Funded by: British Heart Foundation: RG/07/008/23674; Medical Research Council: G0100222, G0701863, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015, U1475000002; NIA NIH HHS: R01 AG032098

    Diabetes care 2010;33;12;2684-91

  • Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.

    Neumann B, Walter T, Hériché JK, Bulkescher J, Erfle H, Conrad C, Rogers P, Poser I, Held M, Liebel U, Cetin C, Sieckmann F, Pau G, Kabbe R, Wünsche A, Satagopam V, Schmitz MH, Chapuis C, Gerlich DW, Schneider R, Eils R, Huber W, Peters JM, Hyman AA, Durbin R, Pepperkok R and Ellenberg J

    MitoCheck Project Group, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, D-69117 Heidelberg, Germany.

    Despite our rapidly growing knowledge about the human genome, we do not know all of the genes required for some of the most basic functions of life. To start to fill this gap we developed a high-throughput phenotypic screening platform combining potent gene silencing by RNA interference, time-lapse microscopy and computational image processing. We carried out a genome-wide phenotypic profiling of each of the approximately 21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes. Phenotypes were scored quantitatively by computational image processing, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival. As part of the Mitocheck consortium, this study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.

    Funded by: Wellcome Trust: 077192

    Nature 2010;464;7289;721-7

  • Laser excitation power and the flow cytometric resolution of complex karyotypes.

    Ng BL and Carter NP

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom. bln@sanger.ac.uk

    The analytical resolution of individual chromosome peaks in the flow karyotype of cell lines is dependent on sample preparation and the detection sensitivity of the flow cytometer. We have investigated the effect of laser power on the resolution of chromosome peaks in cell lines with complex karyotypes. Chromosomes were prepared from a human gastric cancer cell line and a cell line from a patient with an abnormal phenotype using a modified polyamine isolation buffer. The stained chromosome suspensions were analyzed on a MoFlo sorter (Beckman Coulter) equipped with two water-cooled lasers (Coherent). A bivariate flow karyotype was obtained from each of the cell lines at various laser power settings and compared to a karyotype generated using laser power settings of 300 mW. The best separation of chromosome peaks was obtained with laser powers of 300 mW. This study demonstrates the requirement for high-laser powers for the accurate detection and purification of chromosomes, particularly from complex karyotypes, using a conventional flow cytometer.

    Funded by: Wellcome Trust: 079643, WT077008

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2010;77;6;585-8

  • The sudden dominance of blaCTX-M harbouring plasmids in Shigella spp. Circulating in Southern Vietnam.

    Nguyen NT, Ha V, Tran NV, Stabler R, Pham DT, Le TM, van Doorn HR, Cerdeño-Tárraga A, Thomson N, Campbell J, Nguyen VM, Tran TT, Pham MV, Cao TT, Wren B, Farrar J and Baker S

    The Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.

    Background: Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges.

    Methodology: We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla(CTX-M) encoding plasmid.

    We show that two different bla(CTX-M) genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla(CTX-M) gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla(CTX-M-24) gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids.

    Significance: The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting.

    Funded by: Medical Research Council: G0300020; Wellcome Trust

    PLoS neglected tropical diseases 2010;4;6;e702

  • Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations.

    Nica AC, Montgomery SB, Dimas AS, Stranger BE, Beazley C, Barroso I and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

    Funded by: Wellcome Trust

    PLoS genetics 2010;6;4;e1000895

  • Salmonella enterica serovar Typhimurium mutants completely lacking the F(0)F(1) ATPase are novel live attenuated vaccine strains.

    Northen H, Paterson GK, Constantino-Casas F, Bryant CE, Clare S, Mastroeni P, Peters SE and Maskell DJ

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    The F(0)F(1) ATPase plays a central role in both the generation of ATP and the utilisation of ATP for cellular processes such as rotation of bacterial flagella. We have deleted the entire operon encoding the F(0)F(1) ATPase, as well as genes encoding individual F(0) or F(1) subunits, in Salmonella enteric serovar Typhimurium. These mutants were attenuated for virulence, as assessed by bacterial counts in the livers and spleens of intravenously infected mice. The attenuated in vivo growth of the entire atp operon mutant was complemented by the insertion of the atp operon into the malXY pseudogene region. Following clearance of the attenuated mutants from the organs, mice were protected against challenge with the virulent wild type parent strain. We have shown that the F(0)F(1) ATPase is important for bacterial growth in vivo and that atp mutants are effective live attenuated vaccines against Salmonella infection.

    Funded by: Biotechnology and Biological Sciences Research Council: BBS/S/N/2006/13095; Wellcome Trust

    Vaccine 2010;28;4;940-9

  • Synthetic associations in the context of genome-wide association scan signals.

    Orozco G, Barrett JC and Zeggini E

    Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester, UK.

    Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with complex traits, but these only explain a small proportion of the total heritability. It has been recently proposed that rare variants can create 'synthetic association' signals in GWAS, by occurring more often in association with one of the alleles of a common tag single nucleotide polymorphism. While the ultimate evaluation of this hypothesis will require the completion of large-scale sequencing studies, it is informative to place it in the broader context of what is known about the genetic architecture of complex disease. In this review, we draw from empirical and theoretical data to summarize evidence showing that synthetic associations do not underlie many reported GWAS associations.

    Funded by: Wellcome Trust: WT088885/Z/09/Z, WT089120/Z/09/Z

    Human molecular genetics 2010;19;R2;R137-44

  • Dual RMCE for efficient re-engineering of mouse mutant alleles.

    Osterwalder M, Galli A, Rosen B, Skarnes WC, Zeller R and Lopez-Rios J

    Developmental Genetics, Department of Biomedicine, University of Basel, Basel, Switzerland.

    We have developed dual recombinase-mediated cassette exchange (dRMCE) to efficiently re-engineer the thousands of available conditional alleles in mouse embryonic stem cells. dRMCE takes advantage of the wild-type loxP and FRT sites present in these conditional alleles and in many gene-trap lines. dRMCE is a scalable, flexible tool to introduce tags, reporters and mutant coding regions into an endogenous locus of interest in an easy and highly efficient manner.

    Funded by: Wellcome Trust: 077188

    Nature methods 2010;7;11;893-5

  • Thioredoxin and glutathione systems differ in parasitic and free-living platyhelminths.

    Otero L, Bonilla M, Protasio AV, Fernández C, Gladyshev VN and Salinas G

    Cátedra de Inmunología, Facultad de Química, Instituto de Higiene, Universidad de la República, Avda, A, Navarro 3051, Montevideo, Uruguay.

    Background: The thioredoxin and/or glutathione pathways occur in all organisms. They provide electrons for deoxyribonucleotide synthesis, function as antioxidant defenses, in detoxification, Fe/S biogenesis and participate in a variety of cellular processes. In contrast to their mammalian hosts, platyhelminth (flatworm) parasites studied so far, lack conventional thioredoxin and glutathione systems. Instead, they possess a linked thioredoxin-glutathione system with the selenocysteine-containing enzyme thioredoxin glutathione reductase (TGR) as the single redox hub that controls the overall redox homeostasis. TGR has been recently validated as a drug target for schistosomiasis and new drug leads targeting TGR have recently been identified for these platyhelminth infections that affect more than 200 million people and for which a single drug is currently available. Little is known regarding the genomic structure of flatworm TGRs, the expression of TGR variants and whether the absence of conventional thioredoxin and glutathione systems is a signature of the entire platyhelminth phylum.

    Results: We examine platyhelminth genomes and transcriptomes and find that all platyhelminth parasites (from classes Cestoda and Trematoda) conform to a biochemical scenario involving, exclusively, a selenium-dependent linked thioredoxin-glutathione system having TGR as a central redox hub. In contrast, the free-living platyhelminth Schmidtea mediterranea (Class Turbellaria) possesses conventional and linked thioredoxin and glutathione systems. We identify TGR variants in Schistosoma spp. derived from a single gene, and demonstrate their expression. We also provide experimental evidence that alternative initiation of transcription and alternative transcript processing contribute to the generation of TGR variants in platyhelminth parasites.

    Conclusions: Our results indicate that thioredoxin and glutathione pathways differ in parasitic and free-living flatworms and that canonical enzymes were specifically lost in the parasitic lineage. Platyhelminth parasites possess a unique and simplified redox system for diverse essential processes, and thus TGR is an excellent drug target for platyhelminth infections. Inhibition of the central redox wire hub would lead to overall disruption of redox homeostasis and disable DNA synthesis.

    Funded by: FIC NIH HHS: TW006959; NIGMS NIH HHS: GM065204; Wellcome Trust: WT 085775/Z/08/Z

    BMC genomics 2010;11;237

  • Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    Otto TD, Sanders M, Berriman M and Newbold C

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. tdo@sanger.ac.uk

    Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.

    Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.

    Availability: The software is available at http://icorn.sourceforge.net

    Funded by: Wellcome Trust: WT085775/Z/08/Z

    Bioinformatics (Oxford, England) 2010;26;14;1704-7

  • New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.

    Otto TD, Wilinski D, Assefa S, Keane TM, Sarry LR, Böhme U, Lemieux J, Barrell B, Pain A, Berriman M, Newbold C and Llinás M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

    Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z

    Molecular microbiology 2010;76;1;12-24

  • The RING-CH ligase K5 antagonizes restriction of KSHV and HIV-1 particle release by mediating ubiquitin-dependent endosomal degradation of tetherin.

    Pardieu C, Vigan R, Wilson SJ, Calvi A, Zang T, Bieniasz P, Kellam P, Towers GJ and Neil SJ

    MRC Centre for Medical Molecular Virology, University College London, London, United Kingdom.

    Tetherin (CD317/BST2) is an interferon-induced membrane protein that inhibits the release of diverse enveloped viral particles. Several mammalian viruses have evolved countermeasures that inactivate tetherin, with the prototype being the HIV-1 Vpu protein. Here we show that the human herpesvirus Kaposi's sarcoma-associated herpesvirus (KSHV) is sensitive to tetherin restriction and its activity is counteracted by the KSHV encoded RING-CH E3 ubiquitin ligase K5. Tetherin expression in KSHV-infected cells inhibits viral particle release, as does depletion of K5 protein using RNA interference. K5 induces a species-specific downregulation of human tetherin from the cell surface followed by its endosomal degradation. We show that K5 targets a single lysine (K18) in the cytoplasmic tail of tetherin for ubiquitination, leading to relocalization of tetherin to CD63-positive endosomal compartments. Tetherin degradation is dependent on ESCRT-mediated endosomal sorting, but does not require a tyrosine-based sorting signal in the tetherin cytoplasmic tail. Importantly, we also show that the ability of K5 to substitute for Vpu in HIV-1 release is entirely dependent on K18 and the RING-CH domain of K5. By contrast, while Vpu induces ubiquitination of tetherin cytoplasmic tail lysine residues, mutation of these positions has no effect on its antagonism of tetherin function, and residual tetherin is associated with the trans-Golgi network (TGN) in Vpu-expressing cells. Taken together our results demonstrate that K5 is a mechanistically distinct viral countermeasure to tetherin-mediated restriction, and that herpesvirus particle release is sensitive to this mode of antiviral inhibition.

    Funded by: Medical Research Council: G0801172, G0801172(87743), G0801937, G9721629; Wellcome Trust: 076608, WT082274MA

    PLoS pathogens 2010;6;4;e1000843

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk

    The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.

    Funded by: Medical Research Council: MC_U105185859; Wellcome Trust

    Cell stem cell 2010;6;4;382-95

  • Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.

    Park H, Kim JI, Ju YS, Gokcumen O, Mills RE, Kim S, Lee S, Suh D, Hong D, Kang HP, Yoo YJ, Shin JY, Kim HJ, Yavartanoo M, Chang YW, Ha JS, Chong W, Hwang GR, Darvishi K, Kim H, Yang SJ, Yang KS, Kim H, Hurles ME, Scherer SW, Carter NP, Tyler-Smith C, Lee C and Seo JS

    Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Korea.

    Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3x coverage) and two Asian genomes (AK1, with 27.8x coverage and AK2, with 32.0x coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.

    Funded by: NHGRI NIH HHS: HG004221; Wellcome Trust: 077008, 077009, 077014

    Nature genetics 2010;42;5;400-5

  • Using caching and optimization techniques to improve performance of the Ensembl website.

    Parker A, Bragin E, Brent S, Pritchard B, Smith JA and Trevanion S

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK. ap5@sanger.ac.uk

    Background: The Ensembl web site has provided access to genomic information for almost 10 years. During this time the amount of data available through Ensembl has grown dramatically. At the same time, the World Wide Web itself has become a dramatically more important component of the scientific workflow and the way that scientists share and access data and scientific information. Since 2000, the Ensembl web interface has had three major updates and numerous smaller updates. These have largely been in response to expanding data types and valuable representations of existing data types. In 2007 it was realised that a radical new approach would be required in order to serve the project's future requirements, and development therefore focused on identifying suitable web technologies for implementation in the 2008 site redesign.

    Results: By comparing the Ensembl website to well-known "Web 2.0" sites, we were able to identify two main areas in which cutting-edge technologies could be advantageously deployed: server efficiency and interface latency. We then evaluated the performance of the existing site using browser-based tools and Apache benchmarking, and selected appropriate technologies to overcome any issues found. Solutions included optimization of the Apache web server, introduction of caching technologies and widespread implementation of AJAX code. These improvements were successfully deployed on the Ensembl website in late 2008 and early 2009.

    Conclusions: Web 2.0 technologies provide a flexible and efficient way to access the terabytes of data now available from Ensembl, enhancing the user experience through improved website responsiveness and a rich, interactive interface.

    BMC bioinformatics 2010;11;239

  • A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose.

    Paterson AD, Waggott D, Boright AP, Hosseini SM, Shen E, Sylvestre MP, Wong I, Bharaj B, Cleary PA, Lachin JM, MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium), Below JE, Nicolae D, Cox NJ, Canty AJ, Sun L, Bull SB and Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group

    Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, Canada. andrew.paterson@utoronto.ca

    Objective: Glycemia is a major risk factor for the development of long-term complications in type 1 diabetes; however, no specific genetic loci have been identified for glycemic control in individuals with type 1 diabetes. To identify such loci in type 1 diabetes, we analyzed longitudinal repeated measures of A1C from the Diabetes Control and Complications Trial.

    Research design and methods: We performed a genome-wide association study using the mean of quarterly A1C values measured over 6.5 years, separately in the conventional (n = 667) and intensive (n = 637) treatment groups of the DCCT. At loci of interest, linear mixed models were used to take advantage of all the repeated measures. We then assessed the association of these loci with capillary glucose and repeated measures of multiple complications of diabetes.

    Results: We identified a major locus for A1C levels in the conventional treatment group near SORCS1 (10q25.1, P = 7 x 10(-10)), which was also associated with mean glucose (P = 2 x 10(-5)). This was confirmed using A1C in the intensive treatment group (P = 0.01). Other loci achieved evidence close to genome-wide significance: 14q32.13 (GSC) and 9p22 (BNC2) in the combined treatment groups and 15q21.3 (WDR72) in the intensive group. Further, these loci gave evidence for association with diabetic complications, specifically SORCS1 with hypoglycemia and BNC2 with renal and retinal complications. We replicated the SORCS1 association in Genetics of Diabetes in Kidneys (GoKinD) study control subjects (P = 0.01) and the BNC2 association with A1C in nondiabetic individuals.

    Conclusions: A major locus for A1C and glucose in individuals with diabetes is near SORCS1. This may influence the design and analysis of genetic studies attempting to identify risk factors for long-term diabetic complications.

    Funded by: Canadian Institutes of Health Research; NIDDK NIH HHS: N01-DK-6-2204, P60-DK20595, R01-DK-077510, R01-DK077489; NIGMS NIH HHS: T32 GM007197

    Diabetes 2010;59;2;539-49

  • Antagonistic coevolution accelerates molecular evolution.

    Paterson S, Vogwill T, Buckling A, Benmayor R, Spiers AJ, Thomson NR, Quail M, Smith F, Walker D, Libberton B, Fenton A, Hall N and Brockhurst MA

    School of Biological Sciences, Biosciences Building, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK.

    The Red Queen hypothesis proposes that coevolution of interacting species (such as hosts and parasites) should drive molecular evolution through continual natural selection for adaptation and counter-adaptation. Although the divergence observed at some host-resistance and parasite-infectivity genes is consistent with this, the long time periods typically required to study coevolution have so far prevented any direct empirical test. Here we show, using experimental populations of the bacterium Pseudomonas fluorescens SBW25 and its viral parasite, phage Phi2 (refs 10, 11), that the rate of molecular evolution in the phage was far higher when both bacterium and phage coevolved with each other than when phage evolved against a constant host genotype. Coevolution also resulted in far greater genetic divergence between replicate populations, which was correlated with the range of hosts that coevolved phage were able to infect. Consistent with this, the most rapidly evolving phage genes under coevolution were those involved in host infection. These results demonstrate, at both the genomic and phenotypic level, that antagonistic coevolution is a cause of rapid and divergent evolution, and is likely to be a major driver of evolutionary change within species.

    Funded by: Wellcome Trust

    Nature 2010;464;7286;275-8

  • Twenty-eight divergent polysaccharide loci specifying within- and amongst-strain capsule diversity in three strains of Bacteroides fragilis.

    Patrick S, Blakely GW, Houston S, Moore J, Abratt VR, Bertalan M, Cerdeño-Tárraga AM, Quail MA, Corton N, Corton C, Bignell A, Barron A, Clark L, Bentley SD and Parkhill J

    Centre for Infection and Immunity, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Medical Biology Centre, 97 Lisburn Road, Belfast BT9 7BL, UK. s.patrick@qub.ac.uk

    Comparison of the complete genome sequence of Bacteroides fragilis 638R, originally isolated in the USA, was made with two previously sequenced strains isolated in the UK (NCTC 9343) and Japan (YCH46). The presence of 10 loci containing genes associated with polysaccharide (PS) biosynthesis, each including a putative Wzx flippase and Wzy polymerase, was confirmed in all three strains, despite a lack of cross-reactivity between NCTC 9343 and 638R surface PS-specific antibodies by immunolabelling and microscopy. Genomic comparisons revealed an exceptional level of PS biosynthesis locus diversity. Of the 10 divergent PS-associated loci apparent in each strain, none is similar between NCTC 9343 and 638R. YCH46 shares one locus with NCTC 9343, confirmed by mAb labelling, and a second different locus with 638R, making a total of 28 divergent PS biosynthesis loci amongst the three strains. The lack of expression of the phase-variable large capsule (LC) in strain 638R, observed in NCTC 9343, is likely to be due to a point mutation that generates a stop codon within a putative initiating glycosyltransferase, necessary for the expression of the LC in NCTC 9343. Other major sequence differences were observed to arise from different numbers and variety of inserted extra-chromosomal elements, in particular prophages. Extensive horizontal gene transfer has occurred within these strains, despite the presence of a significant number of divergent DNA restriction and modification systems that act to prevent acquisition of foreign DNA. The level of amongst-strain diversity in PS biosynthesis loci is unprecedented.

    Funded by: Wellcome Trust: 061696

    Microbiology (Reading, England) 2010;156;Pt 11;3255-69

  • Genetic evidence that raised sex hormone binding globulin (SHBG) levels reduce the risk of type 2 diabetes.

    Perry JR, Weedon MN, Langenberg C, Jackson AU, Lyssenko V, Sparsø T, Thorleifsson G, Grallert H, Ferrucci L, Maggio M, Paolisso G, Walker M, Palmer CN, Payne F, Young E, Herder C, Narisu N, Morken MA, Bonnycastle LL, Owen KR, Shields B, Knight B, Bennett A, Groves CJ, Ruokonen A, Jarvelin MR, Pearson E, Pascoe L, Ferrannini E, Bornstein SR, Stringham HM, Scott LJ, Kuusisto J, Nilsson P, Neptin M, Gjesing AP, Pisinger C, Lauritzen T, Sandbaek A, Sampson M, MAGIC, Zeggini E, Lindgren CM, Steinthorsdottir V, Thorsteinsdottir U, Hansen T, Schwarz P, Illig T, Laakso M, Stefansson K, Morris AD, Groop L, Pedersen O, Boehnke M, Barroso I, Wareham NJ, Hattersley AT, McCarthy MI and Frayling TM

    Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Magdalen Road, Exeter, UK.

    Epidemiological studies consistently show that circulating sex hormone binding globulin (SHBG) levels are lower in type 2 diabetes patients than non-diabetic individuals, but the causal nature of this association is controversial. Genetic studies can help dissect causal directions of epidemiological associations because genotypes are much less likely to be confounded, biased or influenced by disease processes. Using this Mendelian randomization principle, we selected a common single nucleotide polymorphism (SNP) near the SHBG gene, rs1799941, that is strongly associated with SHBG levels. We used data from this SNP, or closely correlated SNPs, in 27 657 type 2 diabetes patients and 58 481 controls from 15 studies. We then used data from additional studies to estimate the difference in SHBG levels between type 2 diabetes patients and controls. The SHBG SNP rs1799941 was associated with type 2 diabetes [odds ratio (OR) 0.94, 95% CI: 0.91, 0.97; P = 2 x 10(-5)], with the SHBG raising allele associated with reduced risk of type 2 diabetes. This effect was very similar to that expected (OR 0.92, 95% CI: 0.88, 0.96), given the SHBG-SNP versus SHBG levels association (SHBG levels are 0.2 standard deviations higher per copy of the A allele) and the SHBG levels versus type 2 diabetes association (SHBG levels are 0.23 standard deviations lower in type 2 diabetic patients compared to controls). Results were very similar in men and women. There was no evidence that this variant is associated with diabetes-related intermediate traits, including several measures of insulin secretion and resistance. Our results, together with those from another recent genetic study, strengthen evidence that SHBG and sex hormones are involved in the aetiology of type 2 diabetes.

    Funded by: Department of Health: DHCS/07/07/008; Medical Research Council: G0000649, G016121, G0601261, MC_U106179471; NHGRI NIH HHS: 1 Z01 HG000024; NIA NIH HHS: R01 AG24233-0; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193; Wellcome Trust: 076113, 077016/Z/05/Z, 083270/Z/07/Z, 090532, GR072960

    Human molecular genetics 2010;19;3;535-44

  • The Citrobacter rodentium genome sequence reveals convergent evolution with human pathogenic Escherichia coli.

    Petty NK, Bulgin R, Crepin VF, Cerdeño-Tárraga AM, Schroeder GN, Quail MA, Lennard N, Corton C, Barron A, Clark L, Toribio AL, Parkhill J, Dougan G, Frankel G and Thomson NR

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

    Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.

    Funded by: Medical Research Council: G0700823

    Journal of bacteriology 2010;192;2;525-38

  • A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.

    Pickard D, Toribio AL, Petty NK, van Tonder A, Yu L, Goulding D, Barrell B, Rance R, Harris D, Wetter M, Wain J, Choudhary J, Thomson N and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk

    A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;21;5746-54

  • Metamotifs--a generative model for building families of nucleotide position weight matrices.

    Piipari M, Down TA and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. matias.piipari@gmail.com

    Background: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.

    Results: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.

    Conclusions: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.

    Funded by: Wellcome Trust: 077198, 077198/Z/05/Z

    BMC bioinformatics 2010;11;348

  • iMotifs: an integrated sequence motif visualization and analysis environment.

    Piipari M, Down TA, Saini H, Enright A and Hubbard TJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. matias.piipari@gmail.com

    Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.

    Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.

    Contact: matias.piipari@gmail.com; imotifs@googlegroups.com.

    Funded by: Wellcome Trust: 077198, 077198/Z/05/Z

    Bioinformatics (Oxford, England) 2010;26;6;843-4

  • A comprehensive catalogue of somatic mutations from a human cancer genome.

    Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, Mudie LJ, Ning Z, Royce T, Schulz-Trieglaff OB, Spiridou A, Stebbings LA, Szajkowski L, Teague J, Williamson D, Chin L, Ross MT, Campbell PJ, Bentley DR, Futreal PA and Stratton MR

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

    Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7278;191-6

  • A small-cell lung cancer genome with complex signatures of tobacco exposure.

    Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, Costa GL, Lee CC, Minna JD, Gazdar A, Birney E, Rhodes MD, McKernan KJ, Stratton MR, Futreal PA and Campbell PJ

    Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

    Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.

    Funded by: NCI NIH HHS: P50CA70907; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867

    Nature 2010;463;7278;184-90

  • PARK2 deletions occur frequently in sporadic colorectal cancer and accelerate adenoma development in Apc mutant mice.

    Poulogiannis G, McIntyre RE, Dimitriadi M, Apps JR, Wilson CH, Ichimura K, Luo F, Cantley LC, Wyllie AH, Adams DJ and Arends MJ

    Department of Pathology, University of Cambridge, Cambridge CB2 0QQ, United Kingdom.

    In 100 primary colorectal carcinomas, we demonstrate by array comparative genomic hybridization (aCGH) that 33% show DNA copy number (DCN) loss involving PARK2, the gene encoding PARKIN, the E3 ubiquitin ligase whose deficiency is responsible for a form of autosomal recessive juvenile parkinsonism. PARK2 is located on chromosome 6 (at 6q25-27), a chromosome with one of the lowest overall frequencies of DNA copy number alterations recorded in colorectal cancers. The PARK2 deletions are mostly focal (31% approximately 0.5 Mb on average), heterozygous, and show maximum incidence in exons 3 and 4. As PARK2 lies within FRA6E, a large common fragile site, it has been argued that the observed DCN losses in PARK2 in cancer may represent merely the result of enforced replication of locally vulnerable DNA. However, we show that deficiency in expression of PARK2 is significantly associated with adenomatous polyposis coli (APC) deficiency in human colorectal cancer. Evidence of some PARK2 mutations and promoter hypermethylation is described. PARK2 overexpression inhibits cell proliferation in vitro. Moreover, interbreeding of Park2 heterozygous knockout mice with Apc(Min) mice resulted in a dramatic acceleration of intestinal adenoma development and increased polyp multiplicity. We conclude that PARK2 is a tumor suppressor gene whose haploinsufficiency cooperates with mutant APC in colorectal carcinogenesis.

    Funded by: Cancer Research UK: 12401

    Proceedings of the National Academy of Sciences of the United States of America 2010;107;34;15145-50

  • Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes.

    Qi L, Cornelis MC, Kraft P, Stanya KJ, Linda Kao WH, Pankow JS, Dupuis J, Florez JC, Fox CS, Paré G, Sun Q, Girman CJ, Laurie CC, Mirel DB, Manolio TA, Chasman DI, Boerwinkle E, Ridker PM, Hunter DJ, Meigs JB, Lee CH, Hu FB, van Dam RM, Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC) and Diabetes Genetics Replication and Meta-analysis (DIAGRAM) Consortium

    Department of Nutrition, Harvard School of Public Health, and Brigham and Women's Hospital, Boston, MA, USA. nhlqi@channing.harvard.edu

    To identify type 2 diabetes (T2D) susceptibility loci, we conducted genome-wide association (GWA) scans in nested case-control samples from two prospective cohort studies, including 2591 patients and 3052 controls of European ancestry. Validation was performed in 11 independent GWA studies of 10,870 cases and 73,735 controls. We identified significantly associated variants near RBMS1 and ITGB6 genes at 2q24, best-represented by SNP rs7593730 (combined OR=0.90, 95% CI=0.86-0.93; P=3.7x10(-8)). The frequency of the risk-lowering allele T is 0.23. Variants in this region were nominally related to lower fasting glucose and HOMA-IR in the MAGIC consortium (P<0.05). These data suggest that the 2q24 locus may influence the T2D risk by affecting glucose metabolism and insulin resistance.

    Funded by: NCI NIH HHS: CA047988, CA1367 92, CA54281, CA63464, P01CA 089392, P01CA055075, P01CA087969, Z01CP010200; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG0 04436, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004438, U01HG004446, U01HG0047 29, U01HG004726, U01HG004728, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL043851, HL69757, N01- HC-55022, N01-HC- 55018, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55019, N01-HC-55020, N01-HC-55021, N02-HL-6-427, R01 HL071981, R01 HL71981, R01HL086694, R01HL087641, R01HL59367; NIAAA NIH HHS: U10AA008401; NIDA NIH HHS: R01DA013423; NIDCR NIH HHS: U01DE018 993, U01DE018903; NIDDK NIH HHS: DK46200, K01- DK067207, K23 DK65978, K24 DK080140, R01DK058845, R01DK075046, R01DK078616, R90DK071507, T90 DK070078, T90 DK070078-05; PHS HHS: HHSN268200625226C, HHSN268200782096C, RFAHG006033

    Human molecular genetics 2010;19;13;2706-15

  • A human gut microbial gene catalogue established by metagenomic sequencing.

    Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, MetaHIT Consortium, Bork P, Ehrlich SD and Wang J

    BGI-Shenzhen, Shenzhen 518083, China.

    To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, approximately 150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.

    Nature 2010;464;7285;59-65

  • PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice.

    Rad R, Rad L, Wang W, Cadinanos J, Vassiliou G, Rice S, Campos LS, Yusa K, Banerjee R, Li MA, de la Rosa J, Strong A, Lu D, Ellis P, Conte N, Yang FT, Liu P and Bradley A

    Wellcome Trust Sanger Institute, Genome Campus, Hinxton-Cambridge CB10 1SA, UK.

    Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability.

    Funded by: Wellcome Trust: 077186, 079643

    Science (New York, N.Y.) 2010;330;6007;1104-7

  • MEROPS: the peptidase database.

    Rawlings ND, Barrett AJ and Bateman A

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. ndr@sanger.ac.uk

    Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http://merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The database has a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The classification framework is used for attaching information at each level. An important focus of the database has become distinguishing one peptidase from another through identifying the specificity of the peptidase in terms of where it will cleave substrates and with which inhibitors it will interact. We have collected over 39,000 known cleavage sites in proteins, peptides and synthetic substrates. These allow us to display peptidase specificity and alignments of protein substrates to give an indication of how well a cleavage site is conserved, and thus its probable physiological relevance. While the number of new peptidase families and clans has only grown slowly the number of complete genomes has greatly increased. This has allowed us to add an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species.

    Funded by: Wellcome Trust: WT077044/Z/05/Z

    Nucleic acids research 2010;38;Database issue;D227-33

  • CODA: accurate detection of functional associations between proteins in eukaryotic genomes using domain fusion.

    Reid AJ, Ranea JA, Clegg AB and Orengo CA

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom. ar11@sanger.ac.uk

    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.

    Methodology/principal findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.

    Conclusions/significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/.

    Funded by: Biotechnology and Biological Sciences Research Council

    PloS one 2010;5;6;e10908

  • Genome-wide association study identifies five loci associated with lung function.

    Repapi E, Sayers I, Wain LV, Burton PR, Johnson T, Obeidat M, Zhao JH, Ramasamy A, Zhai G, Vitart V, Huffman JE, Igl W, Albrecht E, Deloukas P, Henderson J, Granell R, McArdle WL, Rudnicka AR, Wellcome Trust Case Control Consortium, Barroso I, Loos RJ, Wareham NJ, Mustelin L, Rantanen T, Surakka I, Imboden M, Wichmann HE, Grkovic I, Jankovic S, Zgaga L, Hartikainen AL, Peltonen L, Gyllensten U, Johansson A, Zaboli G, Campbell H, Wild SH, Wilson JF, Gläser S, Homuth G, Völzke H, Mangino M, Soranzo N, Spector TD, Polasek O, Rudan I, Wright AF, Heliövaara M, Ripatti S, Pouta A, Naluai AT, Olin AC, Torén K, Cooper MN, James AL, Palmer LJ, Hingorani AD, Wannamethee SG, Whincup PH, Smith GD, Ebrahim S, McKeever TM, Pavord ID, MacLeod AK, Morris AD, Porteous DJ, Cooper C, Dennison E, Shaheen S, Karrasch S, Schnabel E, Schulz H, Grallert H, Bouatia-Naji N, Delplanque J, Froguel P, Blakey JD, NSHD Respiratory Study Team, Britton JR, Morris RW, Holloway JW, Lawlor DA, Hui J, Nyberg F, Jarvelin MR, Jackson C, Kähönen M, Kaprio J, Probst-Hensch NM, Koch B, Hayward C, Evans DM, Elliott P, Strachan DP, Hall IP and Tobin MD

    Departments of Health Sciences and Genetics, Adrian Building, University of Leicester, Leicester, UK.

    Pulmonary function measures are heritable traits that predict morbidity and mortality and define chronic obstructive pulmonary disease (COPD). We tested genome-wide association with forced expiratory volume in 1 s (FEV(1)) and the ratio of FEV(1) to forced vital capacity (FVC) in the SpiroMeta consortium (n = 20,288 individuals of European ancestry). We conducted a meta-analysis of top signals with data from direct genotyping (n < or = 32,184 additional individuals) and in silico summary association data from the CHARGE Consortium (n = 21,209) and the Health 2000 survey (n < or = 883). We confirmed the reported locus at 4q31 and identified associations with FEV(1) or FEV(1)/FVC and common variants at five additional loci: 2q35 in TNS1 (P = 1.11 x 10(-12)), 4q24 in GSTCD (2.18 x 10(-23)), 5q33 in HTR4 (P = 4.29 x 10(-9)), 6p21 in AGER (P = 3.07 x 10(-15)) and 15q23 in THSD4 (P = 7.24 x 10(-15)). mRNA analyses showed expression of TNS1, GSTCD, AGER, HTR4 and THSD4 in human lung tissue. These associations offer mechanistic insight into pulmonary function regulation and indicate potential targets for interventions to alleviate respiratory disease.

    Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation: PG/06/154/22043, PG/97012, RG/08/013/25942; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6/2, CZD/16/6/4; Department of Health: 0020029; Medical Research Council: G0000934, G0000943, G0401540, G0500539, G0501942, G0600331, G0600705, G0800582, G0801056, G0902125, G9815508, G990146, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, U.1230.00.008.00005.02; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; Wellcome Trust: 068545/Z/02, 075883, 076113/B/04/Z, 077016/Z/05/Z, 079895, 086160/Z/08/A

    Nature genetics 2010;42;1;36-44

  • Using randomised vectors in transcription factor binding site predictions

    Rezwan F, Sun Y, Davey N, Adams R, RUST AG, Robinson M

    Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010;5708880

  • A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses.

    Ripatti S, Tikkanen E, Orho-Melander M, Havulinna AS, Silander K, Sharma A, Guiducci C, Perola M, Jula A, Sinisalo J, Lokki ML, Nieminen MS, Melander O, Salomaa V, Peltonen L and Kathiresan S

    Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland. samuli.ripatti@fi mm.fi

    Background: Comparison of patients with coronary heart disease and controls in genome-wide association studies has revealed several single nucleotide polymorphisms (SNPs) associated with coronary heart disease. We aimed to establish the external validity of these findings and to obtain more precise risk estimates using a prospective cohort design.

    Methods: We tested 13 recently discovered SNPs for association with coronary heart disease in a case-control design including participants differing from those in the discovery samples (3829 participants with prevalent coronary heart disease and 48,897 controls free of the disease) and a prospective cohort design including 30,725 participants free of cardiovascular disease from Finland and Sweden. We modelled the 13 SNPs as a multilocus genetic risk score and used Cox proportional hazards models to estimate the association of genetic risk score with incident coronary heart disease. For case-control analyses we analysed associations between individual SNPs and quintiles of genetic risk score using logistic regression.

    Findings: In prospective cohort analyses, 1264 participants had a first coronary heart disease event during a median 10·7 years' follow-up (IQR 6·7-13·6). Genetic risk score was associated with a first coronary heart disease event. When compared with the bottom quintile of genetic risk score, participants in the top quintile were at 1·66-times increased risk of coronary heart disease in a model adjusting for traditional risk factors (95% CI 1·35-2·04, p value for linear trend=7·3×10(-10)). Adjustment for family history did not change these estimates. Genetic risk score did not improve C index over traditional risk factors and family history (p=0·19), nor did it have a significant effect on net reclassification improvement (2·2%, p=0·18); however, it did have a small effect on integrated discrimination index (0·004, p=0·0006). Results of the case-control analyses were similar to those of the prospective cohort analyses.

    Interpretation: Using a genetic risk score based on 13 SNPs associated with coronary heart disease, we can identify the 20% of individuals of European ancestry who are at roughly 70% increased risk of a first coronary heart disease event. The potential clinical use of this panel of SNPs remains to be defined.

    Funding: The Wellcome Trust; Academy of Finland Center of Excellence for Complex Disease Genetics; US National Institutes of Health; the Donovan Family Foundation.

    Funded by: NHLBI NIH HHS: (R01 HL087676; Wellcome Trust: WT089061/Z/09/Z, WT089062/Z/09/Z

    Lancet 2010;376;9750;1393-400

  • Genomic architecture characterizes tumor progression paths and fate in breast cancer patients.

    Russnes HG, Vollan HK, Lingjaerde OC, Krasnitz A, Lundin P, Naume B, Sørlie T, Borgen E, Rye IH, Langerød A, Chin SF, Teschendorff AE, Stephens PJ, Månér S, Schlichting E, Baumbusch LO, Kåresen R, Stratton MP, Wigler M, Caldas C, Zetterberg A, Hicks J and Børresen-Dale AL

    Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway.

    Distinct molecular subtypes of breast carcinomas have been identified, but translation into clinical use has been limited. We have developed two platform-independent algorithms to explore genomic architectural distortion using array comparative genomic hybridization data to measure (i) whole-arm gains and losses [whole-arm aberration index (WAAI)] and (ii) complex rearrangements [complex arm aberration index (CAAI)]. By applying CAAI and WAAI to data from 595 breast cancer patients, we were able to separate the cases into eight subgroups with different distributions of genomic distortion. Within each subgroup data from expression analyses, sequencing and ploidy indicated that progression occurs along separate paths into more complex genotypes. Histological grade had prognostic impact only in the luminal-related groups, whereas the complexity identified by CAAI had an overall independent prognostic power. This study emphasizes the relation among structural genomic alterations, molecular subtype, and clinical behavior and shows that objective score of genomic complexity (CAAI) is an independent prognostic marker in breast cancer.

    Funded by: Cancer Research UK: C507/A3086

    Science translational medicine 2010;2;38;38ra47

  • Three authors reply

    Salanti G, ZEGGINI E, Ioannidis JPA

    American Journal of Epidemiology. 2010;171;1155-6

  • Association of the 9p21.3 locus with risk of first-ever myocardial infarction in Pakistanis: case-control study in South Asia and updated meta-analysis of Europeans.

    Saleheen D, Alexander M, Rasheed A, Wormser D, Soranzo N, Hammond N, Butterworth A, Zaidi M, Haycock P, Bumpstead S, Potter S, Blackburn H, Gray E, Di Angelantonio E, Kaptoge S, Shah N, Samuel M, Janjua A, Sheikh N, Haider SR, Murtaza M, Ahmad U, Hakeem A, Memon MA, Mallick NH, Azhar M, Samad A, Rasheed SZ, Gardezi AR, Memon NA, Ghaffar A, Memon FU, Zaman KS, Kundi A, Yaqoob Z, Cheema LA, Qamar N, Faruqui A, Jooma R, Niazi JH, Hussain M, Kumar K, Saleem A, Kumar K, Daood MS, Memon F, Gul AA, Abbas S, Zafar J, Shahid F, Memon Z, Bhatti SM, Kayani W, Ali SS, Fahim M, Ishaq M, Frossard P, Deloukas P and Danesh J

    Center for Non-Communicable Diseases, Karachi, Pakistan. danish.saleheen@cncdpk.com

    Objective: To examine variants at the 9p21 locus in a case-control study of acute myocardial infarction (MI) in Pakistanis and to perform an updated meta-analysis of published studies in people of European ancestry.

    Methods and results: A total of 1851 patients with first-ever confirmed MI and 1903 controls were genotyped for 89 tagging single-nucleotide polymorphisms at locus 9p21, including the lead variant (rs1333049) identified by the Wellcome Trust Case Control Consortium. Minor allele frequencies and extent of linkage disequilibrium observed in Pakistanis were broadly similar to those seen in Europeans. In the Pakistani study, 6 variants were associated with MI (P<10(-2)) in the initial sample set, and in an additional 741 cases and 674 controls in whom further genotyping was performed for these variants. For Pakistanis, the odds ratio for MI was 1.13 (95% CI, 1.05 to 1.22; P=2 x 10(-3)) for each copy of the C allele at rs1333049. In comparison, a meta-analysis of studies in Europeans yielded an odds ratio of 1.31 (95% CI, 1.26 to 1.37) for the same variant (P=1 x 10(-3) for heterogeneity). Meta-analyses of 23 variants, in up to 38,250 cases and 84,820 controls generally yielded higher values in Europeans than in Pakistanis.

    Conclusions: To our knowledge, this study provides the first demonstration that variants at the 9p21 locus are significantly associated with MI risk in Pakistanis. However, association signals at this locus were weaker in Pakistanis than those in European studies.

    Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust

    Arteriosclerosis, thrombosis, and vascular biology 2010;30;7;1467-73

  • Genetic determinants of major blood lipids in Pakistanis compared with Europeans.

    Saleheen D, Soranzo N, Rasheed A, Scharnagl H, Gwilliam R, Alexander M, Inouye M, Zaidi M, Potter S, Haycock P, Bumpstead S, Kaptoge S, Di Angelantonio E, Sarwar N, Hunt SE, Sheikh N, Shah N, Samuel M, Haider SR, Murtaza M, Thompson A, Gobin R, Butterworth A, Ahmad U, Hakeem A, Zaman KS, Kundi A, Yaqoob Z, Cheema LA, Qamar N, Faruqui A, Mallick NH, Azhar M, Samad A, Ishaq M, Rasheed SZ, Jooma R, Niazi JH, Gardezi AR, Memon NA, Ghaffar A, Rehman FU, Hoffmann MM, Renner W, Kleber ME, Grammer TB, Stephens J, Attwood A, Koch K, Hussain M, Kumar K, Saleem A, Kumar K, Daood MS, Gul AA, Abbas S, Zafar J, Shahid F, Bhatti SM, Ali SS, Muhammad F, Sagoo G, Bray S, McGinnis R, Dudbridge F, Winkelmann BR, Böehm B, Thompson S, Ouwehand W, März W, Frossard P, Danesh J and Deloukas P

    Center for Non-Communicable Diseases Karachi, Pakistan. danish.saleheen@cncdpk.com

    Background: Evidence is sparse about the genetic determinants of major lipids in Pakistanis.

    Variants (n=45 000) across 2000 genes were assessed in 3200 Pakistanis and compared with 2450 Germans using the same gene array and similar lipid assays. We also did a meta-analysis of selected lipid-related variants in Europeans. Pakistani genetic architecture was distinct from that of several ethnic groups represented in international reference samples. Forty-one variants at 14 loci were significantly associated with levels of HDL-C, triglyceride, or LDL-C. The most significant lipid-related variants identified among Pakistanis corresponded to genes previously shown to be relevant to Europeans, such as CETP associated with HDL-C levels (rs711752; P<10(-13)), APOA5/ZNF259 (rs651821; P<10(-13)) and GCKR (rs1260326; P<10(-13)) with triglyceride levels; and CELSR2 variants with LDL-C levels (rs646776; P<10(-9)). For Pakistanis, these 41 variants explained 6.2%, 7.1%, and 0.9% of the variation in HDL-C, triglyceride, and LDL-C, respectively. Compared with Europeans, the allele frequency of rs662799 in APOA5 among Pakistanis was higher and its impact on triglyceride concentration was greater (P-value for difference <10(-4)).

    Conclusions: Several lipid-related genetic variants are common to Pakistanis and Europeans, though they explain only a modest proportion of population variation in lipid concentration. Allelic frequencies and effect sizes of lipid-related variants can differ between Pakistanis and Europeans.

    Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust

    Circulation. Cardiovascular genetics 2010;3;4;348-57

  • Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge.

    Saxena R, Hivert MF, Langenberg C, Tanaka T, Pankow JS, Vollenweider P, Lyssenko V, Bouatia-Naji N, Dupuis J, Jackson AU, Kao WH, Li M, Glazer NL, Manning AK, Luan J, Stringham HM, Prokopenko I, Johnson T, Grarup N, Boesgaard TW, Lecoeur C, Shrader P, O'Connell J, Ingelsson E, Couper DJ, Rice K, Song K, Andreasen CH, Dina C, Köttgen A, Le Bacquer O, Pattou F, Taneera J, Steinthorsdottir V, Rybin D, Ardlie K, Sampson M, Qi L, van Hoek M, Weedon MN, Aulchenko YS, Voight BF, Grallert H, Balkau B, Bergman RN, Bielinski SJ, Bonnefond A, Bonnycastle LL, Borch-Johnsen K, Böttcher Y, Brunner E, Buchanan TA, Bumpstead SJ, Cavalcanti-Proença C, Charpentier G, Chen YD, Chines PS, Collins FS, Cornelis M, J Crawford G, Delplanque J, Doney A, Egan JM, Erdos MR, Firmann M, Forouhi NG, Fox CS, Goodarzi MO, Graessler J, Hingorani A, Isomaa B, Jørgensen T, Kivimaki M, Kovacs P, Krohn K, Kumari M, Lauritzen T, Lévy-Marchal C, Mayor V, McAteer JB, Meyre D, Mitchell BD, Mohlke KL, Morken MA, Narisu N, Palmer CN, Pakyz R, Pascoe L, Payne F, Pearson D, Rathmann W, Sandbaek A, Sayer AA, Scott LJ, Sharp SJ, Sijbrands E, Singleton A, Siscovick DS, Smith NL, Sparsø T, Swift AJ, Syddall H, Thorleifsson G, Tönjes A, Tuomi T, Tuomilehto J, Valle TT, Waeber G, Walley A, Waterworth DM, Zeggini E, Zhao JH, GIANT consortium, MAGIC investigators, Illig T, Wichmann HE, Wilson JF, van Duijn C, Hu FB, Morris AD, Frayling TM, Hattersley AT, Thorsteinsdottir U, Stefansson K, Nilsson P, Syvänen AC, Shuldiner AR, Walker M, Bornstein SR, Schwarz P, Williams GH, Nathan DM, Kuusisto J, Laakso M, Cooper C, Marmot M, Ferrucci L, Mooser V, Stumvoll M, Loos RJ, Altshuler D, Psaty BM, Rotter JI, Boerwinkle E, Hansen T, Pedersen O, Florez JC, McCarthy MI, Boehnke M, Barroso I, Sladek R, Froguel P, Meigs JB, Groop L, Wareham NJ and Watanabe RM

    Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    Glucose levels 2 h after an oral glucose challenge are a clinical measure of glucose tolerance used in the diagnosis of type 2 diabetes. We report a meta-analysis of nine genome-wide association studies (n = 15,234 nondiabetic individuals) and a follow-up of 29 independent loci (n = 6,958-30,620). We identify variants at the GIPR locus associated with 2-h glucose level (rs10423928, beta (s.e.m.) = 0.09 (0.01) mmol/l per A allele, P = 2.0 x 10(-15)). The GIPR A-allele carriers also showed decreased insulin secretion (n = 22,492; insulinogenic index, P = 1.0 x 10(-17); ratio of insulin to glucose area under the curve, P = 1.3 x 10(-16)) and diminished incretin effect (n = 804; P = 4.3 x 10(-4)). We also identified variants at ADCY5 (rs2877716, P = 4.2 x 10(-16)), VPS13C (rs17271305, P = 4.1 x 10(-8)), GCKR (rs1260326, P = 7.1 x 10(-11)) and TCF7L2 (rs7903146, P = 4.2 x 10(-10)) associated with 2-h glucose. Of the three newly implicated loci (GIPR, ADCY5 and VPS13C), only ADCY5 was found to be associated with type 2 diabetes in collaborating studies (n = 35,869 cases, 89,798 controls, OR = 1.12, 95% CI 1.09-1.15, P = 4.8 x 10(-18)).

    Funded by: British Heart Foundation: RG/07/008/23674; Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z01 HG000024-14; Medical Research Council: G0100222, G0600331, G0701863, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_UP_A620_1014, MC_UP_A620_1015; NCI NIH HHS: P01 CA087969, P01 CA087969-12; NCRR NIH HHS: M01 RR000052, M01 RR000052-46, M01 RR001066-26, M01 RR016500-08; NHGRI NIH HHS: U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02; NHLBI NIH HHS: N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085080, N01 HC085081, N01 HC085082, N01 HC085083, N01 HC085084, N01 HC085085, N01 HC085086, N02 HL64278, R01 HL036310, R01 HL036310-21, R01 HL059367, R01 HL059367-10, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087652, R01 HL087652-03, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04; NIA NIH HHS: R01 AG013196, R01 AG013196-16; NIDA NIH HHS: U54 DA021519, U54 DA021519-04; NIDDK NIH HHS: K23 DK065978, K23 DK065978-05, K24 DK080140, K24 DK080140-04, P30 DK072488, P30 DK072488-06, P30 DK079637, P60 DK079637, P60 DK079637-04, R01 DK029867, R01 DK054261, R01 DK054261-09, R01 DK058845, R01 DK058845-11, R01 DK062370, R01 DK062370-05, R01 DK069922, R01 DK069922-03, R01 DK072193, R01 DK072193-04, R01 DK078616, R01 DK078616-03, R01 DK091718; Wellcome Trust: 077016, 088885

    Nature genetics 2010;42;2;142-8

  • Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.

    Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P and Odom DT

    Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

    Transcription factors (TFs) direct gene expression by binding to DNA regulatory regions. To explore the evolution of gene regulation, we used chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) to determine experimentally the genome-wide occupancy of two TFs, CCAAT/enhancer-binding protein alpha and hepatocyte nuclear factor 4 alpha, in the livers of five vertebrates. Although each TF displays highly conserved DNA binding preferences, most binding is species-specific, and aligned binding events present in all five species are rare. Regions near genes with expression levels that are dependent on a TF are often bound by the TF in multiple species yet show no enhanced DNA sequence constraint. Binding divergence between species can be largely explained by sequence changes to the bound motifs. Among the binding events lost in one lineage, only half are recovered by another binding event within 10 kilobases. Our results reveal large interspecies differences in transcriptional regulation and provide insight into regulatory evolution.

    Funded by: Cancer Research UK: 15603, A15603; European Research Council: 202218; Wellcome Trust: 062023, 079643, WT062023, WT079643

    Science (New York, N.Y.) 2010;328;5981;1036-40

  • CHD7 targets active gene enhancer elements to modulate ES cell-specific gene expression.

    Schnetz MP, Handoko L, Akhtar-Zaidi B, Bartels CF, Pereira CF, Fisher AG, Adams DJ, Flicek P, Crawford GE, Laframboise T, Tesar P, Wei CL and Scacheri PC

    Department of Genetics, Case Western Reserve University, Cleveland, Ohio, United States of America.

    CHD7 is one of nine members of the chromodomain helicase DNA-binding domain family of ATP-dependent chromatin remodeling enzymes found in mammalian cells. De novo mutation of CHD7 is a major cause of CHARGE syndrome, a genetic condition characterized by multiple congenital anomalies. To gain insights to the function of CHD7, we used the technique of chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-Seq) to map CHD7 sites in mouse ES cells. We identified 10,483 sites on chromatin bound by CHD7 at high confidence. Most of the CHD7 sites show features of gene enhancer elements. Specifically, CHD7 sites are predominantly located distal to transcription start sites, contain high levels of H3K4 mono-methylation, found within open chromatin that is hypersensitive to DNase I digestion, and correlate with ES cell-specific gene expression. Moreover, CHD7 co-localizes with P300, a known enhancer-binding protein and strong predictor of enhancer activity. Correlations with 18 other factors mapped by ChIP-seq in mouse ES cells indicate that CHD7 also co-localizes with ES cell master regulators OCT4, SOX2, and NANOG. Correlations between CHD7 sites and global gene expression profiles obtained from Chd7(+/+), Chd7(+/-), and Chd7(-/-) ES cells indicate that CHD7 functions at enhancers as a transcriptional rheostat to modulate, or fine-tune the expression levels of ES-specific genes. CHD7 can modulate genes in either the positive or negative direction, although negative regulation appears to be the more direct effect of CHD7 binding. These data indicate that enhancer-binding proteins can limit gene expression and are not necessarily co-activators. Although ES cells are not likely to be affected in CHARGE syndrome, we propose that enhancer-mediated gene dysregulation contributes to disease pathogenesis and that the critical CHD7 target genes may be subject to positive or negative regulation.

    Funded by: Medical Research Council: G0800024, MC_U120027516; NHGRI NIH HHS: 1U54HG004557-01, R01 HG004456-01, R01HG003521-01, R01HG004722; NICHD NIH HHS: R01HD056369, T32 HD007104

    PLoS genetics 2010;6;7;e1001023

  • An investigation of clinical and immunological events following repeated aerodigestive tract challenge infections with live Mycobacterium bovis Bacille Calmette Guérin.

    Schreiber F, Huo Z, Giemza R, Woodrow M, Fenner N, Stephens Z, Dougan G, Prideaux S, Castello-Branco LR and Lewis DJ

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. fs1@sanger.ac.uk

    Bacille Calmette Guérin substrain Moreau Rio de Janeiro is an attenuated strain of Mycobacterium bovis that has been used extensively as an oral tuberculosis vaccine. We assessed its potential as a challenge model to study clinical and immunological events following repeated mycobacterial gut infection. Seven individuals received three oral challenges with approximately 10(7) viable bacilli. Clinical symptoms, T-cell responses and gene expression patterns in peripheral blood were monitored. Clinical symptoms were relatively mild and declined following each oral challenge. Delayed T-cell responses were observed, and limited differential gene expression detected by microarrays. Oral challenge with BCG Moreau Rio de Janeiro vaccine was immunogenic in healthy volunteers, limiting its potential to explore clinical innate immune responses, but with low reactogenicity.

    Funded by: Wellcome Trust

    Vaccine 2010;28;33;5427-31

  • Legionella pneumophila strain 130b possesses a unique combination of type IV secretion systems and novel Dot/Icm secretion system effector proteins.

    Schroeder GN, Petty NK, Mousnier A, Harding CR, Vogrin AJ, Wee B, Fry NK, Harrison TG, Newton HJ, Thomson NR, Beatson SA, Dougan G, Hartland EL and Frankel G

    Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College, London, United Kingdom.

    Legionella pneumophila is a ubiquitous inhabitant of environmental water reservoirs. The bacteria infect a wide variety of protozoa and, after accidental inhalation, human alveolar macrophages, which can lead to severe pneumonia. The capability to thrive in phagocytic hosts is dependent on the Dot/Icm type IV secretion system (T4SS), which translocates multiple effector proteins into the host cell. In this study, we determined the draft genome sequence of L. pneumophila strain 130b (Wadsworth). We found that the 130b genome encodes a unique set of T4SSs, namely, the Dot/Icm T4SS, a Trb-1-like T4SS, and two Lvh T4SS gene clusters. Sequence analysis substantiated that a core set of 107 Dot/Icm T4SS effectors was conserved among the sequenced L. pneumophila strains Philadelphia-1, Lens, Paris, Corby, Alcoy, and 130b. We also identified new effector candidates and validated the translocation of 10 novel Dot/Icm T4SS effectors that are not present in L. pneumophila strain Philadelphia-1. We examined the prevalence of the new effector genes among 87 environmental and clinical L. pneumophila isolates. Five of the new effectors were identified in 34 to 62% of the isolates, while less than 15% of the strains tested positive for the other five genes. Collectively, our data show that the core set of conserved Dot/Icm T4SS effector proteins is supplemented by a variable repertoire of accessory effectors that may partly account for differences in the virulences and prevalences of particular L. pneumophila strains.

    Funded by: Medical Research Council: G0700823; Wellcome Trust

    Journal of bacteriology 2010;192;22;6001-16

  • Natural history of Christianson syndrome.

    Schroer RJ, Holden KR, Tarpey PS, Matheus MG, Griesemer DA, Friez MJ, Fan JZ, Simensen RJ, Strømme P, Stevenson RE, Stratton MR and Schwartz CE

    Greenwood Genetic Center, Greenwood, South Carolina, USA. schroer@ggc.org

    Christianson syndrome is an X-linked mental retardation syndrome characterized by microcephaly, impaired ocular movement, severe global developmental delay, hypotonia which progresses to spasticity, and early onset seizures of variable types. Gilfillan et al.2008] reported mutations in SLC9A6, the gene encoding the sodium/hydrogen exchanger NHE6, in the family first reported and in three others. They also noted the clinical similarities to Angelman syndrome and found cerebellar atrophy on MRI and elevated glutamate/glutamine in the basal ganglia on MRS. Here we report on nonsense mutations in two additional families. The natural history is detailed in childhood and adult life, the similarities to Angelman syndrome confirmed, and the MRI/MRS findings documented in three affected boys.

    American journal of medical genetics. Part A 2010;152A;11;2775-83

  • Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946.

    Sebaihia M, Bocsanczy AM, Biehl BS, Quail MA, Perna NT, Glasner JD, DeClerck GA, Cartinhour S, Schneider DJ, Bentley SD, Parkhill J and Beer SV

    The Sanger Institute, Hinxton, Cambridge, United Kingdom.

    Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related enterobacteria revealed signs of pathoadaptation to rosaceous hosts.

    Journal of bacteriology 2010;192;7;2020-1

  • Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits.

    Segrè AV, DIAGRAM Consortium, MAGIC investigators, Groop L, Mootha VK, Daly MJ and Altshuler D

    Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.

    Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA's performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and approximately 1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.

    PLoS genetics 2010;6;8

  • A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations.

    Shi W, Ayub Q, Vermeulen M, Shao RG, Zuniga S, van der Gaag K, de Knijff P, Kayser M, Xue Y and Tyler-Smith C

    The Wellcome Trust Sanger Institute, Hinxton, Cambs., United Kingdom.

    We have investigated human male demographic history using 590 males from 51 populations in the Human Genome Diversity Project - Centre d'Etude du Polymorphisme Humain worldwide panel, typed with 37 Y-chromosomal Single Nucleotide Polymorphisms and 65 Y-chromosomal Short Tandem Repeats and analyzed with the program Bayesian Analysis of Trees With Internal Node Generation. The general patterns we observe show a gradient from the oldest population time to the most recent common ancestors (TMRCAs) and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa, and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y-lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe, and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan and recent expansion of the Mozabites in North Africa.

    Molecular biology and evolution 2010;27;2;385-93

  • Copy number variant detection in inbred strains from short read sequence data.

    Simpson JT, McIntyre RE, Adams DJ and Durbin R

    Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.

    Summary: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust

    Bioinformatics (Oxford, England) 2010;26;4;565-7

  • Floxin, a resource for genetically engineering mouse ESCs.

    Singla V, Hunkapiller J, Santos N, Seol AD, Norman AR, Wakenight P, Skarnes WC and Reiter JF

    Department of Biochemistry and Biophysics, Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.

    We describe a method for the highly efficient and precise targeted modification of gene trap loci in mouse embryonic stem cells (ESCs). Through the Floxin method, gene trap mutations were reverted and new DNA sequences inserted using Cre recombinase and a shuttle vector, pFloxin. Floxin technology is applicable to the existing collection of 24,149 compatible gene trap cell lines, which should enable high-throughput modification of many genes in mouse ESCs.

    Funded by: NIAMS NIH HHS: R01 AR054396, R01 AR054396-01A1, R01AR054396

    Nature methods 2010;7;1;50-2

  • Family history of premature coronary heart disease and risk prediction in the EPIC-Norfolk prospective population study.

    Sivapalaratnam S, Boekholdt SM, Trip MD, Sandhu MS, Luben R, Kastelein JJ, Wareham NJ and Khaw KT

    Department of Vascular Medicine, Academic Medical Center, Amsterdam, The Netherlands.

    Objective: The value of a family history for coronary heart disease (CHD) in addition to established cardiovascular risk factors in predicting an individual's risk of CHD is unclear. In the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort, the authors tested whether adding family history of premature CHD in first-degree relatives improves risk prediction compared with the Framingham risk score (FRS) alone.

    This study comprised 10,288 men and 12,553 women aged 40-79 years participating in the EPIC-Norfolk cohort who were followed for a mean of 10.9±2.1 years (mean±SD). The authors computed the FRS as well as a modified score taking into account family history of premature CHD. A family history of CHD was indeed associated with an increased risk of future CHD, independent of established risk factors (FRS-adjusted HR of 1.74 (95% CI 1.56 to 1.95) for family history of premature CHD). However, adding family history of CHD to the FRS resulted in a negative net reclassification of 2%. In the subgroup of individuals estimated to be at intermediate risk, family history of premature CHD resulted in an increase in net reclassification of 2%. The sensitivity increased with 0.4%, and the specificity decreased 0.8%.

    Conclusion: Although family history of CHD was an independent risk factor of future CHD, its use did not improve classification of individuals into clinically relevant risk categories based on the FRS. Among study participants at intermediate risk of CHD, adding family history of premature CHD resulted in, at best, a modest improvement in reclassification of individuals into a more accurate risk category.

    Funded by: Cancer Research UK; Medical Research Council

    Heart (British Cardiac Society) 2010;96;24;1985-9

  • Neuronal MeCP2 is expressed at near histone-octamer levels and globally alters the chromatin state.

    Skene PJ, Illingworth RS, Webb S, Kerr AR, James KD, Turner DJ, Andrews R and Bird AP

    Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK.

    MeCP2 is a nuclear protein with an affinity for methylated DNA that can recruit histone deacetylases. Deficiency or excess of MeCP2 causes severe neurological problems, suggesting that the number of molecules per cell must be precisely regulated. We quantified MeCP2 in neuronal nuclei and found that it is nearly as abundant as the histone octamer. Despite this high abundance, MeCP2 associates preferentially with methylated regions, and high-throughput sequencing showed that its genome-wide binding tracks methyl-CpG density. MeCP2 deficiency results in global changes in neuronal chromatin structure, including elevated histone acetylation and a doubling of histone H1. Neither change is detectable in glia, where MeCP2 occurs at lower levels. The mutant brain also shows elevated transcription of repetitive elements. Our data argue that MeCP2 may not act as a gene-specific transcriptional repressor in neurons, but might instead dampen transcriptional noise genome-wide in a DNA methylation-dependent manner.

    Funded by: Wellcome Trust: 077224, 079643

    Molecular cell 2010;37;4;457-68

  • Constitutional translocation breakpoint mapping by genome-wide paired-end sequencing identifies HACE1 as a putative Wilms tumour susceptibility gene.

    Slade I, Stephens P, Douglas J, Barker K, Stebbings L, Abbaszadeh F, Pritchard-Jones K, FACT collaboration, Cole R, Pizer B, Stiller C, Vujanic G, Scott RH, Stratton MR and Rahman N

    Section Chair and Professor of Human Genetics, The Institute of Cancer Research, 15 Cotswold Road, Sutton SM2 5NG, UK.

    Background: Localisation of the breakpoints of chromosomal translocations has aided the discovery of several disease genes but has traditionally required laborious investigation of chromosomes by fluorescent in situ hybridisation approaches. Here, a strategy that utilises genome-wide paired-end massively parallel DNA sequencing to rapidly map translocation breakpoints is reported. This method was used to fine map a de novo t(5;6)(q21;q21) translocation in a child with bilateral, young-onset Wilms tumour.

    Methods and results: Genome-wide paired-end sequencing was performed for approximately 6 million randomly generated approximately 3 kb fragments from constitutional DNA containing the translocation, and six fragments in which one end mapped to chromosome 5 and the other to chromosome 6 were identified. This mapped the translocation breakpoints to within 1.7 kb. Then, PCR assays that amplified across the rearrangement junction were designed to characterise the breakpoints at sequence-level resolution. The 6q21 breakpoint transects and truncates HACE1, an E3 ubiquitin-protein ligase that has been implicated as a somatically inactivated target in Wilms tumourigenesis. To evaluate the contribution of HACE1 to Wilms tumour predisposition, the gene was mutationally screened in 450 individuals with Wilms tumour. One child with unilateral Wilms tumour and a truncating HACE1 mutation was identified.

    Conclusions: These data indicate that constitutional disruption of HACE1 likely predisposes to Wilms tumour. However, HACE1 mutations are rare and therefore can only make a small contribution to Wilms tumour incidence. More broadly, this study demonstrates the utility of genome-wide paired-end sequencing in the delineation of apparently balanced chromosomal translocations, for which it is likely to become the method of choice.

    Funded by: Cancer Research UK: 11886, 9024, C8620_A8857, C8620_A9024

    Journal of medical genetics 2010;47;5;342-7

  • Common variants at 10 genomic loci influence hemoglobin A₁(C) levels via glycemic and nonglycemic pathways.

    Soranzo N, Sanna S, Wheeler E, Gieger C, Radke D, Dupuis J, Bouatia-Naji N, Langenberg C, Prokopenko I, Stolerman E, Sandhu MS, Heeney MM, Devaney JM, Reilly MP, Ricketts SL, Stewart AF, Voight BF, Willenborg C, Wright B, Altshuler D, Arking D, Balkau B, Barnes D, Boerwinkle E, Böhm B, Bonnefond A, Bonnycastle LL, Boomsma DI, Bornstein SR, Böttcher Y, Bumpstead S, Burnett-Miller MS, Campbell H, Cao A, Chambers J, Clark R, Collins FS, Coresh J, de Geus EJ, Dei M, Deloukas P, Döring A, Egan JM, Elosua R, Ferrucci L, Forouhi N, Fox CS, Franklin C, Franzosi MG, Gallina S, Goel A, Graessler J, Grallert H, Greinacher A, Hadley D, Hall A, Hamsten A, Hayward C, Heath S, Herder C, Homuth G, Hottenga JJ, Hunter-Merrill R, Illig T, Jackson AU, Jula A, Kleber M, Knouff CW, Kong A, Kooner J, Köttgen A, Kovacs P, Krohn K, Kühnel B, Kuusisto J, Laakso M, Lathrop M, Lecoeur C, Li M, Li M, Loos RJ, Luan J, Lyssenko V, Mägi R, Magnusson PK, Mälarstig A, Mangino M, Martínez-Larrad MT, März W, McArdle WL, McPherson R, Meisinger C, Meitinger T, Melander O, Mohlke KL, Mooser VE, Morken MA, Narisu N, Nathan DM, Nauck M, O'Donnell C, Oexle K, Olla N, Pankow JS, Payne F, Peden JF, Pedersen NL, Peltonen L, Perola M, Polasek O, Porcu E, Rader DJ, Rathmann W, Ripatti S, Rocheleau G, Roden M, Rudan I, Salomaa V, Saxena R, Schlessinger D, Schunkert H, Schwarz P, Seedorf U, Selvin E, Serrano-Ríos M, Shrader P, Silveira A, Siscovick D, Song K, Spector TD, Stefansson K, Steinthorsdottir V, Strachan DP, Strawbridge R, Stumvoll M, Surakka I, Swift AJ, Tanaka T, Teumer A, Thorleifsson G, Thorsteinsdottir U, Tönjes A, Usala G, Vitart V, Völzke H, Wallaschofski H, Waterworth DM, Watkins H, Wichmann HE, Wild SH, Willemsen G, Williams GH, Wilson JF, Winkelmann J, Wright AF, WTCCC, Zabena C, Zhao JH, Epstein SE, Erdmann J, Hakonarson HH, Kathiresan S, Khaw KT, Roberts R, Samani NJ, Fleming MD, Sladek R, Abecasis G, Boehnke M, Froguel P, Groop L, McCarthy MI, Kao WH, Florez JC, Uda M, Wareham NJ, Barroso I and Meigs JB

    Human Genetics, Wellcome Trust Sanger Institute, Hinxton, U.K.

    Objective: Glycated hemoglobin (HbA₁(c)), used to monitor and diagnose diabetes, is influenced by average glycemia over a 2- to 3-month period. Genetic factors affecting expression, turnover, and abnormal glycation of hemoglobin could also be associated with increased levels of HbA₁(c). We aimed to identify such genetic factors and investigate the extent to which they influence diabetes classification based on HbA₁(c) levels.

    Research design and methods: We studied associations with HbA₁(c) in up to 46,368 nondiabetic adults of European descent from 23 genome-wide association studies (GWAS) and 8 cohorts with de novo genotyped single nucleotide polymorphisms (SNPs). We combined studies using inverse-variance meta-analysis and tested mediation by glycemia using conditional analyses. We estimated the global effect of HbA₁(c) loci using a multilocus risk score, and used net reclassification to estimate genetic effects on diabetes screening.

    Results: Ten loci reached genome-wide significant association with HbA(1c), including six new loci near FN3K (lead SNP/P value, rs1046896/P = 1.6 × 10⁻²⁶), HFE (rs1800562/P = 2.6 × 10⁻²⁰), TMPRSS6 (rs855791/P = 2.7 × 10⁻¹⁴), ANK1 (rs4737009/P = 6.1 × 10⁻¹²), SPTA1 (rs2779116/P = 2.8 × 10⁻⁹) and ATP11A/TUBGCP3 (rs7998202/P = 5.2 × 10⁻⁹), and four known HbA₁(c) loci: HK1 (rs16926246/P = 3.1 × 10⁻⁵⁴), MTNR1B (rs1387153/P = 4.0 × 10⁻¹¹), GCK (rs1799884/P = 1.5 × 10⁻²⁰) and G6PC2/ABCB11 (rs552976/P = 8.2 × 10⁻¹⁸). We show that associations with HbA₁(c) are partly a function of hyperglycemia associated with 3 of the 10 loci (GCK, G6PC2 and MTNR1B). The seven nonglycemic loci accounted for a 0.19 (% HbA₁(c)) difference between the extreme 10% tails of the risk score, and would reclassify ∼2% of a general white population screened for diabetes with HbA₁(c).

    Conclusions: GWAS identified 10 genetic loci reproducibly associated with HbA₁(c). Six are novel and seven map to loci where rarer variants cause hereditary anemias and iron storage disorders. Common variants at these loci likely influence HbA₁(c) levels via erythrocyte biology, and confer a small but detectable reclassification of diabetes diagnosis by HbA₁(c).

    Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: G0401527, G0701863, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003; NIDDK NIH HHS: R01 DK072193

    Diabetes 2010;59;12;3229-39

  • Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index.

    Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Lango Allen H, Lindgren CM, Luan J, Mägi R, Randall JC, Vedantam S, Winkler TW, Qi L, Workalemahu T, Heid IM, Steinthorsdottir V, Stringham HM, Weedon MN, Wheeler E, Wood AR, Ferreira T, Weyant RJ, Segrè AV, Estrada K, Liang L, Nemesh J, Park JH, Gustafsson S, Kilpeläinen TO, Yang J, Bouatia-Naji N, Esko T, Feitosa MF, Kutalik Z, Mangino M, Raychaudhuri S, Scherag A, Smith AV, Welch R, Zhao JH, Aben KK, Absher DM, Amin N, Dixon AL, Fisher E, Glazer NL, Goddard ME, Heard-Costa NL, Hoesel V, Hottenga JJ, Johansson A, Johnson T, Ketkar S, Lamina C, Li S, Moffatt MF, Myers RH, Narisu N, Perry JR, Peters MJ, Preuss M, Ripatti S, Rivadeneira F, Sandholt C, Scott LJ, Timpson NJ, Tyrer JP, van Wingerden S, Watanabe RM, White CC, Wiklund F, Barlassina C, Chasman DI, Cooper MN, Jansson JO, Lawrence RW, Pellikka N, Prokopenko I, Shi J, Thiering E, Alavere H, Alibrandi MT, Almgren P, Arnold AM, Aspelund T, Atwood LD, Balkau B, Balmforth AJ, Bennett AJ, Ben-Shlomo Y, Bergman RN, Bergmann S, Biebermann H, Blakemore AI, Boes T, Bonnycastle LL, Bornstein SR, Brown MJ, Buchanan TA, Busonero F, Campbell H, Cappuccio FP, Cavalcanti-Proença C, Chen YD, Chen CM, Chines PS, Clarke R, Coin L, Connell J, Day IN, den Heijer M, Duan J, Ebrahim S, Elliott P, Elosua R, Eiriksdottir G, Erdos MR, Eriksson JG, Facheris MF, Felix SB, Fischer-Posovszky P, Folsom AR, Friedrich N, Freimer NB, Fu M, Gaget S, Gejman PV, Geus EJ, Gieger C, Gjesing AP, Goel A, Goyette P, Grallert H, Grässler J, Greenawalt DM, Groves CJ, Gudnason V, Guiducci C, Hartikainen AL, Hassanali N, Hall AS, Havulinna AS, Hayward C, Heath AC, Hengstenberg C, Hicks AA, Hinney A, Hofman A, Homuth G, Hui J, Igl W, Iribarren C, Isomaa B, Jacobs KB, Jarick I, Jewell E, John U, Jørgensen T, Jousilahti P, Jula A, Kaakinen M, Kajantie E, Kaplan LM, Kathiresan S, Kettunen J, Kinnunen L, Knowles JW, Kolcic I, König IR, Koskinen S, Kovacs P, Kuusisto J, Kraft P, Kvaløy K, Laitinen J, Lantieri O, Lanzani C, Launer LJ, Lecoeur C, Lehtimäki T, Lettre G, Liu J, Lokki ML, Lorentzon M, Luben RN, Ludwig B, MAGIC, Manunta P, Marek D, Marre M, Martin NG, McArdle WL, McCarthy A, McKnight B, Meitinger T, Melander O, Meyre D, Midthjell K, Montgomery GW, Morken MA, Morris AP, Mulic R, Ngwa JS, Nelis M, Neville MJ, Nyholt DR, O'Donnell CJ, O'Rahilly S, Ong KK, Oostra B, Paré G, Parker AN, Perola M, Pichler I, Pietiläinen KH, Platou CG, Polasek O, Pouta A, Rafelt S, Raitakari O, Rayner NW, Ridderstråle M, Rief W, Ruokonen A, Robertson NR, Rzehak P, Salomaa V, Sanders AR, Sandhu MS, Sanna S, Saramies J, Savolainen MJ, Scherag S, Schipf S, Schreiber S, Schunkert H, Silander K, Sinisalo J, Siscovick DS, Smit JH, Soranzo N, Sovio U, Stephens J, Surakka I, Swift AJ, Tammesoo ML, Tardif JC, Teder-Laving M, Teslovich TM, Thompson JR, Thomson B, Tönjes A, Tuomi T, van Meurs JB, van Ommen GJ, Vatin V, Viikari J, Visvikis-Siest S, Vitart V, Vogel CI, Voight BF, Waite LL, Wallaschofski H, Walters GB, Widen E, Wiegand S, Wild SH, Willemsen G, Witte DR, Witteman JC, Xu J, Zhang Q, Zgaga L, Ziegler A, Zitting P, Beilby JP, Farooqi IS, Hebebrand J, Huikuri HV, James AL, Kähönen M, Levinson DF, Macciardi F, Nieminen MS, Ohlsson C, Palmer LJ, Ridker PM, Stumvoll M, Beckmann JS, Boeing H, Boerwinkle E, Boomsma DI, Caulfield MJ, Chanock SJ, Collins FS, Cupples LA, Smith GD, Erdmann J, Froguel P, Grönberg H, Gyllensten U, Hall P, Hansen T, Harris TB, Hattersley AT, Hayes RB, Heinrich J, Hu FB, Hveem K, Illig T, Jarvelin MR, Kaprio J, Karpe F, Khaw KT, Kiemeney LA, Krude H, Laakso M, Lawlor DA, Metspalu A, Munroe PB, Ouwehand WH, Pedersen O, Penninx BW, Peters A, Pramstaller PP, Quertermous T, Reinehr T, Rissanen A, Rudan I, Samani NJ, Schwarz PE, Shuldiner AR, Spector TD, Tuomilehto J, Uda M, Uitterlinden A, Valle TT, Wabitsch M, Waeber G, Wareham NJ, Watkins H, Procardis Consortium, Wilson JF, Wright AF, Zillikens MC, Chatterjee N, McCarroll SA, Purcell S, Schadt EE, Visscher PM, Assimes TL, Borecki IB, Deloukas P, Fox CS, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, Mohlke KL, O'Connell JR, Peltonen L, Schlessinger D, Strachan DP, van Duijn CM, Wichmann HE, Frayling TM, Thorsteinsdottir U, Abecasis GR, Barroso I, Boehnke M, Stefansson K, North KE, McCarthy MI, Hirschhorn JN, Ingelsson E and Loos RJ

    Metabolism Initiative and Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA.

    Obesity is globally prevalent and highly heritable, but its underlying genetic factors remain largely elusive. To identify genetic loci for obesity susceptibility, we examined associations between body mass index and ∼ 2.8 million SNPs in up to 123,865 individuals with targeted follow up of 42 SNPs in up to 125,931 additional individuals. We confirmed 14 known obesity susceptibility loci and identified 18 new loci associated with body mass index (P < 5 × 10⁻⁸), one of which includes a copy number variant near GPRC5B. Some loci (at MC4R, POMC, SH2B1 and BDNF) map near key hypothalamic regulators of energy balance, and one of these loci is near GIPR, an incretin receptor. Furthermore, genes in other newly associated loci may provide new insights into human body weight regulation.

    Funded by: British Heart Foundation; Cancer Research UK; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0000934, G0401527, G0501184, G0600331, G0600705, G0601261, G0701863, G0801056, G0900554, G1000758B, G9521010, G9824984, MC_QA137934, MC_U106179471, MC_U106179472, MC_U106188470, MC_U127561128, MC_U137686857; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, U01-CA098233; NCRR NIH HHS: M01-RR00425, U54-RR020278, UL1-RR025005; NHGRI NIH HHS: HG002651, N01-HG-65403, T32 HG000040, T32 HG000040-17, T32-HG00040, U01-HG004399, U01-HG004402, Z01-HG000024; NHLBI NIH HHS: HL084729, HL71981, K99-HL094535, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC55018, N01-HC55019, N01-HC55020, N01-HC55022, N01-HC55222, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, N01-N01HC-55021, N02-HL64278, R01 HL071981, R01 HL087647, R01-HL086694, R01-HL087641, R01-HL087647, R01-HL087652, R01-HL087676, R01-HL087679, R01-HL087700, R01-HL088119, R01-HL59367, U01 HL054527, U01-HL080295, U01-HL084756, U01-HL72515; NIA NIH HHS: N01-AG12100, N01-AG12109, R01-AG031890; NIAAA NIH HHS: AA014041, AA07535, AA10248, AA13320, AA13321, AA13326, K05 AA017688; NIAMS NIH HHS: K08 AR055688, K08 AR055688-03, K08 AR055688-04; NIDA NIH HHS: DA12854, R01 DA012854; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK46200, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30 DK072488, P30-DK072488, R01 DK072193, R01 DK072193-05, R01-DK073490, R01-DK075787, R01DK068336, R01DK075681, T32 DK007191, U01 DK062370-08, U01-DK062418; NIGMS NIH HHS: T32 GM074905, U01-GM074518; NIMH NIH HHS: MH084698, R01-MH59160, R01-MH59565, R01-MH59566, R01-MH59571, R01-MH59586, R01-MH59587, R01-MH59588, R01-MH60870, R01-MH60879, R01-MH61675, R01-MH63706, R01-MH67257, R01-MH79469, R01-MH79470, R01-MH81800, RL1-MH083268; PHS HHS: 263-MA-410953; Wellcome Trust: 064890, 068545, 072960, 075491, 076113, 077016, 079557, 079895, 081682, 083270, 085301, 086596, 090532

    Nature genetics 2010;42;11;937-48

  • The ear

    Spiden,S.L. and Steel,K.P.;

    Embryos, Genes and Birth Defects 2010;Chapter 10;231-62

  • Pooled analysis indicates that the GSTT1 deletion, GSTM1 deletion, and GSTP1 Ile105Val polymorphisms do not modify breast cancer risk in BRCA1 and BRCA2 mutation carriers.

    Spurdle AB, Fahey P, Chen X, McGuffog L, kConFab, Easton D, Peock S, Cook M, EMBRACE, Simard J, INHERIT, Rebbeck TR, MAGIC, Antoniou AC and Chenevix-Trench G

    Division of Genetics and Population Health, Queensland Institute of Medical Research, 300 Herston Rd, Herston 4006, Australia.

    The GSTP1, GSTM1, and GSTT1 detoxification genes all have functional polymorphisms that are common in the general population. A single study of 320 BRCA1/2 carriers previously assessed their effect in BRCA1 or BRCA2 mutation carriers. This study showed no evidence for altered risk of breast cancer for individuals with the GSTT1 and GSTM1 deletion variants, but did report that the GSTP1 Ile105Val (rs1695) variant was associated with increased breast cancer risk in carriers. We investigated the association between these three GST polymorphisms and breast cancer risk using existing data from 718 women BRCA1 and BRCA2 mutation carriers from Australia, the UK, Canada, and the USA. Data were analyzed within a proportional hazards framework using Cox regression. There was no evidence to show that any of the polymorphisms modified disease risk for BRCA1 or BRCA2 carriers, and there was no evidence for heterogeneity between sites. These results support the need for replication studies to confirm or refute hypothesis-generating studies.

    Funded by: Canadian Institutes of Health Research; Cancer Research UK: 10118, 11022, 11174, C1287/A10118, C1287/A8874; NCI NIH HHS: R01 CA083855, R01 CA083855-01, R01 CA083855-02, R01 CA083855-03, R01 CA083855-04, R01 CA083855-05, R01 CA083855-06, R01 CA083855-07, R01 CA083855-08, R01 CA083855-09, R01 CA083855-10, R01 CA083855-11, R01 CA102776, R01 CA102776-01A1, R01 CA102776-02, R01 CA102776-03, R01 CA102776-04, R01 CA102776-05, R01-CA083855, R01-CA102776

    Breast cancer research and treatment 2010;122;1;281-5

  • Low-density lipoprotein receptor-related protein 5 polymorphisms are associated with bone mineral density in Greek postmenopausal women: an interaction with calcium intake.

    Stathopoulou MG, Dedoussis GV, Trovas G, Katsalira A, Hammond N, Deloukas P and Lyritis GP

    Department of Dietetics and Nutrition, Harokopio University, 17671 Athens, Greece.

    The low-density lipoprotein receptor-related protein 5 (LRP5) has been shown to play a significant role in bone biology. This study aimed to assess the association of four common polymorphisms of the LRP5 gene with bone mineral density (BMD) and possible genexcalcium intake interactions in Greek postmenopausal women. For this observational cross-sectional association study, healthy postmenopausal women (N=578) were recruited (between December 2006 and January 2008) and genotyped for four polymorphisms (rs1784235, rs491347, rs4988321, and rs4988330) in the LRP5 gene. Measurements of BMD were performed and detailed medical, dietary, and anthropometric data were recorded. Student t tests and multiple linear regression models were applied after controlling for potential covariates (ie, age, weight, height, and calcium intake). None of the polymorphisms was associated with the presence of osteoporosis, fractures, and hip BMD. All polymorphisms were associated with unadjusted spine BMD, with the exception of rs4988330. Only rs4988321 was associated with adjusted spine BMD, where the presence of the A allele was associated with significantly lower spine BMD compared with the GG genotype (P=0.002). An interaction of the rs4988321 polymorphism with calcium intake (P=0.016) was found. The carriers of the A allele demonstrated significantly lower spine BMD compared to GG homozygotes (P=0.001) only in the lowest calcium intake group (<680 mg/day), whereas in the highest calcium intake group no differences were found in BMD between genotypes. These findings demonstrate that both rs4988321 polymorphism and its interaction with calcium intake are associated with BMD, whereas higher calcium intake was shown to decrease the negative effect of this polymorphism on BMD.

    Journal of the American Dietetic Association 2010;110;7;1078-83

  • A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

    Stegle O, Parts L, Durbin R and Winn J

    Max Planck Institutes Tübingen, Tübingen, Germany. oliver.stegle@tuebingen.mpg.de

    Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/.

    Funded by: Wellcome Trust: WT077192/Z/05/Z

    PLoS computational biology 2010;6;5;e1000770

  • Genome-wide end-sequenced BAC resources for the NOD/MrkTac() and NOD/ShiLtJ() mouse genomes.

    Steward CA, Humphray S, Plumb B, Jones MC, Quail MA, Rice S, Cox T, Davies R, Bonfield J, Keane TM, Nefedov M, de Jong PJ, Lyons P, Wicker L, Todd J, Hayashizaki Y, Gulban O, Danska J, Harrow J, Hubbard T, Rogers J and Adams DJ

    The Wellcome Trust Sanger Institute, Hinxton, UK. cas@sanger.ac.uk

    Non-obese diabetic (NOD) mice spontaneously develop type 1 diabetes (T1D) due to the progressive loss of insulin-secreting beta-cells by an autoimmune driven process. NOD mice represent a valuable tool for studying the genetics of T1D and for evaluating therapeutic interventions. Here we describe the development and characterization by end-sequencing of bacterial artificial chromosome (BAC) libraries derived from NOD/MrkTac (DIL NOD) and NOD/ShiLtJ (CHORI-29), two commonly used NOD substrains. The DIL NOD library is composed of 196,032 BACs and the CHORI-29 library is composed of 110,976 BACs. The average depth of genome coverage of the DIL NOD library, estimated from mapping the BAC end-sequences to the reference mouse genome sequence, was 7.1-fold across the autosomes and 6.6-fold across the X chromosome. Clones from this library have an average insert size of 150 kb and map to over 95.6% of the reference mouse genome assembly (NCBIm37), covering 98.8% of Ensembl mouse genes. By the same metric, the CHORI-29 library has an average depth over the autosomes of 5.0-fold and 2.8-fold coverage of the X chromosome, the reduced X chromosome coverage being due to the use of a male donor for this library. Clones from this library have an average insert size of 205 kb and map to 93.9% of the reference mouse genome assembly, covering 95.7% of Ensembl genes. We have identified and validated 191,841 single nucleotide polymorphisms (SNPs) for DIL NOD and 114,380 SNPs for CHORI-29. In total we generated 229,736,133 bp of sequence for the DIL NOD and 121,963,211 bp for the CHORI-29. These BAC libraries represent a powerful resource for functional studies, such as gene targeting in NOD embryonic stem (ES) cell lines, and for sequencing and mapping experiments.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust: 062023, 077198

    Genomics 2010;95;2;105-10

  • Leena Peltonen 1952-2010 Obituary

    STRATTON MR, Lander E

    Cell. 2010;141;208-9

  • Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging.

    Surakka I, Kristiansson K, Anttila V, Inouye M, Barnes C, Moutsianas L, Salomaa V, Daly M, Palotie A, Peltonen L and Ripatti S

    Institute for Molecular Medicine Finland, FIMM, University of Helsinki, FI-00014 Helsinki, Finland.

    The combining of genome-wide association (GWA) data across populations represents a major challenge for massive global meta-analyses. Genotype imputation using densely genotyped reference samples facilitates the combination of data across different genotyping platforms. HapMap data is typically used as a reference for single nucleotide polymorphism (SNP) imputation and tagging copy number polymorphisms (CNPs). However, the advantage of having population-specific reference panels for founder populations has not been evaluated. We looked at the properties and impact of adding 81 individuals from a founder population to HapMap3 reference data on imputation quality, CNP tagging, and power to detect association in simulations and in an independent cohort of 2138 individuals. The gain in SNP imputation accuracy was highest among low-frequency markers (minor allele frequency [MAF] < 5%), for which adding the population-specific samples to the reference set increased the median R(2) between imputed and genotyped SNPs from 0.90 to 0.94. Accuracy also increased in regions with high recombination rates. Similarly, a reference set with population-specific extension facilitated the identification of better tag-SNPs for a subset of CNPs; for 4% of CNPs the R(2) between SNP genotypes and CNP intensity in the independent population cohort was at least twice as high as without the extension. We conclude that even a relatively small population-specific reference set yields considerable benefits in SNP imputation, CNP tagging accuracy, and the power to detect associations in founder populations and population isolates in particular.

    Funded by: Wellcome Trust: WT089061/Z/09/Z, WT089062/Z/09/Z

    Genome research 2010;20;10;1344-51

  • Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally.

    Sutherland CJ, Tanomsing N, Nolder D, Oguike M, Jennison C, Pukrittayakamee S, Dolecek C, Hien TT, do Rosário VE, Arez AP, Pinto J, Michon P, Escalante AA, Nosten F, Burke M, Lee R, Blaze M, Otto TD, Barnwell JW, Pain A, Williams J, White NJ, Day NP, Snounou G, Lockhart PJ, Chiodini PL, Imwong M and Polley SD

    Health Protection Agency Malaria Reference Laboratory, Immunology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom. colin.sutherland@lshtm.ac.uk

    Background: Malaria in humans is caused by apicomplexan parasites belonging to 5 species of the genus Plasmodium. Infections with Plasmodium ovale are widely distributed but rarely investigated, and the resulting burden of disease is not known. Dimorphism in defined genes has led to P. ovale parasites being divided into classic and variant types. We hypothesized that these dimorphs represent distinct parasite species.

    Methods: Multilocus sequence analysis of 6 genetic characters was carried out among 55 isolates from 12 African and 3 Asia-Pacific countries.

    Results: Each genetic character displayed complete dimorphism and segregated perfectly between the 2 types. Both types were identified in samples from Ghana, Nigeria, São Tomé, Sierra Leone, and Uganda and have been described previously in Myanmar. Splitting of the 2 lineages is estimated to have occurred between 1.0 and 3.5 million years ago in hominid hosts.

    Conclusions: We propose that P. ovale comprises 2 nonrecombining species that are sympatric in Africa and Asia. We speculate on possible scenarios that could have led to this speciation. Furthermore, the relatively high frequency of imported cases of symptomatic P. ovale infection in the United Kingdom suggests that the morbidity caused by ovale malaria has been underestimated.

    Funded by: NIGMS NIH HHS: R01 GM080586; Wellcome Trust: 093956

    The Journal of infectious diseases 2010;201;10;1544-50

  • Common variants in the ATP2B1 gene are associated with susceptibility to hypertension: the Japanese Millennium Genome Project.

    Tabara Y, Kohara K, Kita Y, Hirawa N, Katsuya T, Ohkubo T, Hiura Y, Tajima A, Morisaki T, Miyata T, Nakayama T, Takashima N, Nakura J, Kawamoto R, Takahashi N, Hata A, Soma M, Imai Y, Kokubo Y, Okamura T, Tomoike H, Iwai N, Ogihara T, Inoue I, Tokunaga K, Johnson T, Caulfield M, Munroe P, Global Blood Pressure Genetics Consortium, Umemura S, Ueshima H and Miki T

    Department of Basic Medical Research and Education, Ehime University Graduate School of Medicine, Toon-City, Ehime, Japan. tabara@m.ehime-u.ac.jp

    Hypertension is one of the most common complex genetic disorders. We have described previously 38 single nucleotide polymorphisms (SNPs) with suggestive association with hypertension in Japanese individuals. In this study we extend our previous findings by analyzing a large sample of Japanese individuals (n=14 105) for the most associated SNPs. We also conducted replication analyses in Japanese of susceptibility loci for hypertension identified recently from genome-wide association studies of European ancestries. Association analysis revealed significant association of the ATP2B1 rs2070759 polymorphism with hypertension (P=5.3×10(-5); allelic odds ratio: 1.17 [95% CI: 1.09 to 1.26]). Additional SNPs in ATP2B1 were subsequently genotyped, and the most significant association was with rs11105378 (odds ratio: 1.31 [95% CI: 1.21 to 1.42]; P=4.1×10(-11)). Association of rs11105378 with hypertension was cross-validated by replication analysis with the Global Blood Pressure Genetics consortium data set (odds ratio: 1.13 [95% CI: 1.05 to 1.21]; P=5.9×10(-4)). Mean adjusted systolic blood pressure was highly significantly associated with the same SNP in a meta-analysis with individuals of European descent (P=1.4×10(-18)). ATP2B1 mRNA expression levels in umbilical artery smooth muscle cells were found to be significantly different among rs11105378 genotypes. Seven SNPs discovered in published genome-wide association studies were also genotyped in the Japanese population. In the combined analysis with replicated 3 genes, FGF5 rs1458038, CYP17A1, rs1004467, and CSK rs1378942, odds ratio of the highest risk group was 2.27 (95% CI: 1.65 to 3.12; P=4.6×10(-7)) compared with the lower risk group. In summary, this study confirmed common genetic variation in ATP2B1, as well as FGF5, CYP17A1, and CSK, to be associated with blood pressure levels and risk of hypertension.

    Funded by: Medical Research Council: G0400874, G0401527, G0801056, MC_U105630924, MC_UP_A100_1003

    Hypertension 2010;56;5;973-80

  • Rec8-containing cohesin maintains bivalents without turnover during the growing phase of mouse oocytes.

    Tachibana-Konwalski K, Godwin J, van der Weyden L, Champion L, Kudo NR, Adams DJ and Nasmyth K

    Department of Biochemistry, University of Oxford, Oxford, United Kingdom.

    During female meiosis, bivalent chromosomes are thought to be held together from birth until ovulation by sister chromatid cohesion mediated by cohesin complexes whose ring structure depends on kleisin subunits, either Rec8 or Scc1. Because cohesion is established at DNA replication in the embryo, its maintenance for such a long time may require cohesin turnover. To address whether Rec8- or Scc1-containing cohesin holds bivalents together and whether it turns over, we created mice whose kleisin subunits can be cleaved by TEV protease. We show by microinjection experiments and confocal live-cell imaging that Rec8 cleavage triggers chiasmata resolution during meiosis I and sister centromere disjunction during meiosis II, while Scc1 cleavage triggers sister chromatid disjunction in the first embryonic mitosis, demonstrating a dramatic transition from Rec8- to Scc1-containing cohesin at fertilization. Crucially, activation of an ectopic Rec8 transgene during the growing phase of Rec8(TEV)(/TEV) oocytes does not prevent TEV-mediated bivalent destruction, implying little or no cohesin turnover for ≥2 wk during oocyte growth. We suggest that the inability of oocytes to regenerate cohesion may contribute to age-related meiosis I errors.

    Funded by: Cancer Research UK; Medical Research Council: G0701161, G0901046; Wellcome Trust

    Genes & development 2010;24;22;2505-16

  • Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.

    Tee WW, Pardo M, Theunissen TW, Yu L, Choudhary JS, Hajkova P and Surani MA

    Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.

    Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.

    Funded by: Medical Research Council: G0800784; Wellcome Trust

    Genes & development 2010;24;24;2772-7

  • Methodological challenges of genome-wide association analysis in Africa.

    Teo YY, Small KS and Kwiatkowski DP

    Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.

    Medical research in Africa has yet to benefit from the advent of genome-wide association (GWA) analysis, partly because the genotyping tools and statistical methods that have been developed for European and Asian populations struggle to deal with the high levels of genome diversity and population structure in Africa. However, the haplotypic diversity of African populations might help to overcome one of the major roadblocks in GWA research, the fine mapping of causal variants. We review the methodological challenges and consider how GWA studies in Africa will be transformed by new approaches in statistical imputation and large-scale genome sequencing.

    Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 077383, 082370

    Nature reviews. Genetics 2010;11;2;149-60

  • Biological, clinical and population relevance of 95 loci for blood lipids.

    Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, Johansen CT, Fouchier SW, Isaacs A, Peloso GM, Barbalic M, Ricketts SL, Bis JC, Aulchenko YS, Thorleifsson G, Feitosa MF, Chambers J, Orho-Melander M, Melander O, Johnson T, Li X, Guo X, Li M, Shin Cho Y, Jin Go M, Jin Kim Y, Lee JY, Park T, Kim K, Sim X, Twee-Hee Ong R, Croteau-Chonka DC, Lange LA, Smith JD, Song K, Hua Zhao J, Yuan X, Luan J, Lamina C, Ziegler A, Zhang W, Zee RY, Wright AF, Witteman JC, Wilson JF, Willemsen G, Wichmann HE, Whitfield JB, Waterworth DM, Wareham NJ, Waeber G, Vollenweider P, Voight BF, Vitart V, Uitterlinden AG, Uda M, Tuomilehto J, Thompson JR, Tanaka T, Surakka I, Stringham HM, Spector TD, Soranzo N, Smit JH, Sinisalo J, Silander K, Sijbrands EJ, Scuteri A, Scott J, Schlessinger D, Sanna S, Salomaa V, Saharinen J, Sabatti C, Ruokonen A, Rudan I, Rose LM, Roberts R, Rieder M, Psaty BM, Pramstaller PP, Pichler I, Perola M, Penninx BW, Pedersen NL, Pattaro C, Parker AN, Pare G, Oostra BA, O'Donnell CJ, Nieminen MS, Nickerson DA, Montgomery GW, Meitinger T, McPherson R, McCarthy MI, McArdle W, Masson D, Martin NG, Marroni F, Mangino M, Magnusson PK, Lucas G, Luben R, Loos RJ, Lokki ML, Lettre G, Langenberg C, Launer LJ, Lakatta EG, Laaksonen R, Kyvik KO, Kronenberg F, König IR, Khaw KT, Kaprio J, Kaplan LM, Johansson A, Jarvelin MR, Janssens AC, Ingelsson E, Igl W, Kees Hovingh G, Hottenga JJ, Hofman A, Hicks AA, Hengstenberg C, Heid IM, Hayward C, Havulinna AS, Hastie ND, Harris TB, Haritunians T, Hall AS, Gyllensten U, Guiducci C, Groop LC, Gonzalez E, Gieger C, Freimer NB, Ferrucci L, Erdmann J, Elliott P, Ejebe KG, Döring A, Dominiczak AF, Demissie S, Deloukas P, de Geus EJ, de Faire U, Crawford G, Collins FS, Chen YD, Caulfield MJ, Campbell H, Burtt NP, Bonnycastle LL, Boomsma DI, Boekholdt SM, Bergman RN, Barroso I, Bandinelli S, Ballantyne CM, Assimes TL, Quertermous T, Altshuler D, Seielstad M, Wong TY, Tai ES, Feranil AB, Kuzawa CW, Adair LS, Taylor HA, Borecki IB, Gabriel SB, Wilson JG, Holm H, Thorsteinsdottir U, Gudnason V, Krauss RM, Mohlke KL, Ordovas JM, Munroe PB, Kooner JS, Tall AR, Hegele RA, Kastelein JJ, Schadt EE, Rotter JI, Boerwinkle E, Strachan DP, Mooser V, Stefansson K, Reilly MP, Samani NJ, Schunkert H, Cupples LA, Sandhu MS, Ridker PM, Rader DJ, van Duijn CM, Peltonen L, Abecasis GR, Boehnke M and Kathiresan S

    Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA.

    Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.

    Funded by: British Heart Foundation: PG/02/128, PG/08/094, PG/08/094/26019, RG/07/005/23633, SP/08/005/25115; Chief Scientist Office: CZB/4/710; FIC NIH HHS: TW05596; Medical Research Council: G0000934, G0401527, G0601966, G0700931, G0701863, G0801056, G0801566, G9521010, G9521010D, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128; NCI NIH HHS: CA 047988; NCRR NIH HHS: M01-RR00425, RR20649, U54 RR020278, UL1RR025005; NHGRI NIH HHS: 1Z01 HG000024, N01-HG-65403, T32 HG00040, U01HG004402; NHLBI NIH HHS: 5R01HL087679-02, 5R01HL08770003, 5R01HL08821502, HL 04381, HL 080467, HL-54776, HL085144, K99 HL098364, K99 HL098364-01, K99HL094535, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-6-4278, R01 HL087647, R01 HL087676, R01 HL089650, R01HL086694, R01HL087641, R01HL087652, R01HL59367, RC1 HL099634, RC1 HL099634-02, RC1 HL099793, RC2 HL101864,, RC2 HL102419, T32HL007208, U01 HL069757, U01 HL080295; NIA NIH HHS: N01-AG-12100; NICHD NIH HHS: R24 HD050924; NIDDK NIH HHS: 5R01DK06833603, 5R01DK07568102, DK062370, DK063491, DK072193, DK078150, DK56350, R01 DK072193, R01 DK078150, U01 DK062370, U01 DK062418; NIEHS NIH HHS: ES10126; NIGMS NIH HHS: T32 GM007092; PHS HHS: HHSN268200625226C; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895

    Nature 2010;466;7307;707-13

  • The systematic functional analysis of Plasmodium protein kinases identifies essential regulators of mosquito transmission.

    Tewari R, Straschil U, Bateman A, Böhme U, Cherevach I, Gong P, Pain A and Billker O

    Institute of Genetics, QMC, University of Nottingham, Nottingham NG7 2UH, UK. rita.tewari@nottingham.ac.uk

    Although eukaryotic protein kinases (ePKs) contribute to many cellular processes, only three Plasmodium falciparum ePKs have thus far been identified as essential for parasite asexual blood stage development. To identify pathways essential for parasite transmission between their mammalian host and mosquito vector, we undertook a systematic functional analysis of ePKs in the genetically tractable rodent parasite Plasmodium berghei. Modeling domain signatures of conventional ePKs identified 66 putative Plasmodium ePKs. Kinomes are highly conserved between Plasmodium species. Using reverse genetics, we show that 23 ePKs are redundant for asexual erythrocytic parasite development in mice. Phenotyping mutants at four life cycle stages in Anopheles stephensi mosquitoes revealed functional clusters of kinases required for sexual development and sporogony. Roles for a putative SR protein kinase (SRPK) in microgamete formation, a conserved regulator of clathrin uncoating (GAK) in ookinete formation, and a likely regulator of energy metabolism (SNF1/KIN) in sporozoite development were identified.

    Funded by: Medical Research Council: G0501670, G0900109; Wellcome Trust: 087656, WT078335MA, WT089085/Z/09/Z

    Cell host & microbe 2010;8;4;377-87

  • An adaptable two-color flow cytometric assay to quantitate the invasion of erythrocytes by Plasmodium falciparum parasites.

    Theron M, Hesketh RL, Subramanian S and Rayner JC

    Sanger Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.

    Plasmodium falciparum genotyping has recently undergone a revolution, and genome-wide genotype datasets are now being collected for large numbers of parasite isolates. By contrast, phenotyping technologies have lagged behind, with few high throughput phenotyping platforms available. Invasion of human erythrocytes by Plasmodium falciparum is a phenotype of particular interest because of its central role in parasite development. Invasion is a variable phenotype influenced by natural genetic variation in both the parasite and host and is governed by multiple overlapping and in some instances redundant parasite-erythrocyte interactions. To facilitate the scale-up of erythrocyte invasion phenotyping, we have developed a novel platform based on two-color flow cytometry that distinguishes parasite invasion from parasite growth. Target cells that had one or more receptors removed using enzymatic treatment were prelabeled with intracellular dyes CFDA-SE or DDAO-SE, incubated with P. falciparum parasites, and parasites that had invaded either labeled or unlabeled cells were detected with fluorescent DNA-intercalating dyes Hoechst 33342 or SYBR Green I. Neither cell label interfered with erythrocyte invasion, and the combination of cell and parasite dyes recapitulated known invasion phenotypes for three standard laboratory strains. Three different dye combinations with minimal overlap have been validated, meaning the same assay can be adapted to instruments harboring several different combinations of laser lines. The assay is sensitive, operates in a 96-well format, and can be used to quantitate the impact of natural or experimental genetic variation on erythrocyte invasion efficiency.

    Funded by: Wellcome Trust

    Cytometry. Part A : the journal of the International Society for Analytical Cytology 2010;77;11;1067-74

  • De novo apparently balanced translocations in man are predominantly paternal in origin and associated with a significant increase in paternal age.

    Thomas NS, Morris JK, Baptista J, Ng BL, Crolla JA and Jacobs PA

    Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury SP2 8TE, UK. simon.thomas@salisbury.nhs.uk

    Background: Congenital chromosome abnormalities are relatively common in our species and among structural abnormalities the most common class is balanced reciprocal translocations. Determining the parental origin of de novo balanced translocations may provide insights into how and when they arise. While there is a general paternal bias in the origin of non-recurrent unbalanced rearrangements, there are few data on parental origin of non-recurrent balanced rearrangements.

    Methods: The parental origin of a series of de novo balanced reciprocal translocations was determined using DNA from flow sorted derivative chromosomes and linkage analysis.

    Results: Of 27 translocations, we found 26 to be of paternal origin and only one of maternal origin. We also found the paternally derived translocations to be associated with a significantly increased paternal age (p<0.008).

    Conclusion: Our results suggest there is a very pronounced paternal bias in the origin of all non-recurrent reciprocal translocations and that they may arise during one of the numerous mitotic divisions that occur in the spermatogonial germ cells prior to meiosis.

    Funded by: Wellcome Trust: WT077008

    Journal of medical genetics 2010;47;2;112-5

  • CpG islands influence chromatin structure via the CpG-binding protein Cfp1.

    Thomson JP, Skene PJ, Selfridge J, Clouaire T, Guy J, Webb S, Kerr AR, Deaton A, Andrews R, James KD, Turner DJ, Illingworth R and Bird A

    Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK.

    CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.

    Funded by: Cancer Research UK; Medical Research Council: G0800026; Wellcome Trust: 079643, 091580, 098051

    Nature 2010;464;7291;1082-6

  • A visual migraine aura locus maps to 9q21-q22.

    Tikka-Kleemola P, Artto V, Vepsäläinen S, Sobel EM, Räty S, Kaunisto MA, Anttila V, Hämäläinen E, Sumelahti ML, Ilmavirta M, Färkkilä M, Kallela M, Palotie A and Wessman M

    Folkhälsan Research Center, Biomedicum Helsinki, PO Box 63, 00014 University of Helsinki, Finland. maija.wessman@helsinki.fi

    Objective: To identify susceptibility loci for visual migraine aura in migraine families primarily affected with scintillating scotoma type of aura.

    Methods: We included Finnish migraine families with at least 2 affected family members with scintillating scotoma as defined by the International Criteria for Headache Disorders-II. A total of 36 multigenerational families containing 351 individuals were included, 185 of whom have visual aura and 159 have scintillating scotoma. Parametric and nonparametric linkage analyses were performed with 378 microsatellite markers. The most promising linkage loci found were fine-mapped with additional microsatellite markers.

    Results: A novel locus on chromosome 9q22-q31 for migraine aura was identified (HLOD = 4.7 at 104 cM). Fine-mapping identified a shared haplotype segment of 12 cM (9.8 Mb) on 9q21-q22 among the aura affected. Four other loci showed linkage to aura: a locus on 12p13 showed significant evidence of linkage, and suggestive evidence of linkage was detected to loci on chromosomes 5q13, 6q25, and 13q14.

    Conclusions: A novel visual migraine aura locus has been mapped to chromosome 9q21-q22. Interestingly, this region has previously been linked to occipitotemporal lobe epilepsy with prominent visual symptoms. Our finding further supports a shared genetic background in migraine and epilepsy and suggests that susceptibility variant(s) to visual aura for both of these traits are located in the 9q21-q22 locus.

    Funded by: NIGMS NIH HHS: GM053275

    Neurology 2010;74;15;1171-7

  • Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

    Tsai IJ, Otto TD and Berriman M

    Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. jit@sanger.ac.uk

    Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.

    Funded by: Wellcome Trust: WT 085775/Z/08/Z

    Genome biology 2010;11;4;R41

  • Variants near DMRT1, TERT and ATF7IP are associated with testicular germ cell cancer.

    Turnbull C, Rapley EA, Seal S, Pernet D, Renwick A, Hughes D, Ricketts M, Linger R, Nsengimana J, Deloukas P, Huddart RA, Bishop DT, Easton DF, Stratton MR, Rahman N and UK Testicular Cancer Collaboration

    Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK. clare.turnbull@icr.ac.uk

    We conducted a genome-wide association study for testicular germ cell tumor, genotyping 298,782 SNPs in 979 affected individuals and 4,947 controls from the UK and replicating associations in a further 664 cases and 3,456 controls. We identified three new susceptibility loci, two of which include genes that are involved in telomere regulation. We identified two independent signals within the TERT-CLPTM1L locus on chromosome 5, which has previously been associated with multiple other cancers (rs4635969, OR=1.54, P=1.14x10(-23); rs2736100, OR=1.33, P=7.55x10(-15)). We also identified a locus on chromosome 12 (rs2900333, OR=1.27, P=6.16x10(-10)) that contains ATF7IP, a regulator of TERT expression. Finally, we identified a locus on chromosome 9 (rs755383, OR=1.37, P=1.12x10(-23)), containing the sex determination gene DMRT1, which has been linked to teratoma susceptibility in mice.

    Funded by: Cancer Research UK; Department of Health; Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02

    Nature genetics 2010;42;7;604-7

  • Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a.

    Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, King RJ, Lin AA, Chow CE, Semino O, Battaglia V, Kutuev I, Järve M, Chaubey G, Ayub Q, Mohyuddin A, Mehdi SQ, Sengupta S, Rogaev EI, Khusnutdinova EK, Pshenichnov A, Balanovsky O, Balanovska E, Jeran N, Augustin DH, Baldovic M, Herrera RJ, Thangaraj K, Singh V, Singh L, Majumder P, Rudan P, Primorac D, Villems R and Kivisild T

    Division of Child and Adolescent Psychiatry and Child Development, Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 1201 Welch Road, Stanford, CA 94304-5485, USA. under@stanford.edu

    Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.

    European journal of human genetics : EJHG 2010;18;4;479-84

  • The Swedish new variant of Chlamydia trachomatis: genome sequence, morphology, cell tropism and phenotypic characterization.

    Unemo M, Seth-Smith HM, Cutcliffe LT, Skilton RJ, Barlow D, Goulding D, Persson K, Harris SR, Kelly A, Bjartling C, Fredlund H, Olcén P, Thomson NR and Clarke IN

    National Reference Laboratory for Pathogenic Neisseria, Department of Laboratory Medicine, Orebro University Hospital, Orebro, Sweden. magnus.unemo@orebroll.se

    Chlamydia trachomatis is a major cause of bacterial sexually transmitted infections worldwide. In 2006, a new variant of C. trachomatis (nvCT), carrying a 377 bp deletion within the plasmid, was reported in Sweden. This deletion included the targets used by the commercial diagnostic systems from Roche and Abbott. The nvCT is clonal (serovar/genovar E) and it spread rapidly in Sweden, undiagnosed by these systems. The degree of spread may also indicate an increased biological fitness of nvCT. The aims of this study were to describe the genome of nvCT, to compare the nvCT genome to all available C. trachomatis genome sequences and to investigate the biological properties of nvCT. An early nvCT isolate (Sweden2) was analysed by genome sequencing, growth kinetics, microscopy, cell tropism assay and antimicrobial susceptibility testing. It was compared with relevant C. trachomatis isolates, including a similar serovar E C. trachomatis wild-type strain that circulated in Sweden prior to the initially undetected expansion of nvCT. The nvCT genome does not contain any major genetic polymorphisms - the genes for central metabolism, development cycle and virulence are conserved - or phenotypic characteristics that indicate any altered biological fitness. This is supported by the observations that the nvCT and wild-type C. trachomatis infections are very similar in terms of epidemiological distribution, and that differences in clinical signs are only described, in one study, in women. In conclusion, the nvCT does not appear to have any altered biological fitness. Therefore, the rapid transmission of nvCT in Sweden was due to the strong diagnostic selective advantage and its introduction into a high-frequency transmitting population.

    Funded by: Wellcome Trust: 080348

    Microbiology (Reading, England) 2010;156;Pt 5;1394-404

  • Chemokine ligand 2 genetic variants, serum monocyte chemoattractant protein-1 levels, and the risk of coronary artery disease.

    van Wijk DF, van Leuven SI, Sandhu MS, Tanck MW, Hutten BA, Wareham NJ, Kastelein JJ, Stroes ES, Khaw KT and Boekholdt SM

    Department of Vascular Medicine, Academic Medical Center, Amsterdam, the Netherlands. d.f.vanwijk@amc.uva.nl

    Objective: In humans, evidence about the association between levels of monocyte chemoattractant protein-1 (MCP-1), its coding gene chemokine (C-C motif) ligand 2 (CCL2), and risk of coronary artery disease (CAD) is contradictory.

    We performed a nested case-control study in the prospective EPIC-Norfolk cohort investigating the relationship between CCL2 single-nucleotide polymorphisms (SNPs), MCP-1 concentrations, and the risk of future CAD. Cases (n=1138) were apparently healthy men and women aged 45 to 79 years who developed fatal or nonfatal CAD during a mean follow-up of 6 years. Controls (n=2237) were matched by age, sex, and enrollment time. Using linear regression analysis no association between CCL2 SNPs and MCP-1 serum concentrations became apparent, nor did we find a significant association between MCP-1 serum levels and risk of future CAD. Finally, Cox regression analysis showed no significant association between CCL2 SNPs and the future CAD risk. In addition, we did not find any robust associations between the CCL2 haplotypes and MCP-1 serum concentration or future CAD risk.

    Conclusions: Our data do not support previous publications indicating that MCP-1 is involved in the pathogenesis of CAD.

    Funded by: Cancer Research UK; Medical Research Council

    Arteriosclerosis, thrombosis, and vascular biology 2010;30;7;1460-6

  • Somatic structural rearrangements in genetically engineered mouse mammary tumors.

    Varela I, Klijn C, Stephens PJ, Mudie LJ, Stebbings L, Galappaththige D, van der Gulden H, Schut E, Klarenbeek S, Campbell PJ, Wessels LF, Stratton MR, Jonkers J, Futreal PA and Adams DJ

    Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK. iv3@sanger.ac.uk

    Background: Here we present the first paired-end sequencing of tumors from genetically engineered mouse models of cancer to determine how faithfully these models recapitulate the landscape of somatic rearrangements found in human tumors. These were models of Trp53-mutated breast cancer, Brca1- and Brca2-associated hereditary breast cancer, and E-cadherin (Cdh1) mutated lobular breast cancer.

    Results: We show that although Brca1- and Brca2-deficient mouse mammary tumors have a defect in the homologous recombination pathway, there is no apparent difference in the type or frequency of somatic rearrangements found in these cancers when compared to other mouse mammary cancers, and tumors from all genetic backgrounds showed evidence of microhomology-mediated repair and non-homologous end-joining processes. Importantly, mouse mammary tumors were found to carry fewer structural rearrangements than human mammary cancers and expressed in-frame fusion genes. Like the fusion genes found in human mammary tumors, these were not recurrent. One mouse tumor was found to contain an internal deletion of exons of the Lrp1b gene, which led to a smaller in-frame transcript. We found internal in-frame deletions in the human ortholog of this gene in a significant number (4.2%) of human cancer cell lines.

    Conclusions: Paired-end sequencing of mouse mammary tumors revealed that they display significant heterogeneity in their profiles of somatic rearrangement but, importantly, fewer rearrangements than cognate human mammary tumors, probably because these cancers have been induced by strong driver mutations engineered into the mouse genome. Both human and mouse mammary cancers carry expressed fusion genes and conserved homozygous deletions.

    Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust: 088340, 093867

    Genome biology 2010;11;10;R100

  • The use of DNA transposons for cancer gene discovery in mice.

    Vassiliou G, Rad R and Bradley A

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Insertional mutagenesis in mice is a potent instrument for cancer gene discovery. Until recently, retroviruses were the main experimental tools in this field and application of insertional mutagenesis was limited to tissues for which these agents have tropism, namely hemopoietic cells and mammary epithelium. However, the field has been revolutionized and greatly expanded with the recent reanimation of the transposons, a highly flexible group of insertional mutagens first discovered in maize, which have now been adapted for use in mammalian cells. Transposons do not only extend the application of insertional mutagenesis to any tissue of choice, but also allow a more extensive and unbiased coverage of the genome, can be designed to selectively activate or inactivate genes, and are highly amenable to temporal and spatial control. This chapter gives an overview of the design and application of transposons to cancer gene discovery in mice.

    Methods in enzymology 2010;477;91-106

  • The Molecular Basis of Leukaemia and Lymphoma

    Vassiliou,G.S. and Green,A.R.;

    Postgraduate Haematology 2010;Chapter 21;380-94

  • Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis.

    Voight BF, Scott LJ, Steinthorsdottir V, Morris AP, Dina C, Welch RP, Zeggini E, Huth C, Aulchenko YS, Thorleifsson G, McCulloch LJ, Ferreira T, Grallert H, Amin N, Wu G, Willer CJ, Raychaudhuri S, McCarroll SA, Langenberg C, Hofmann OM, Dupuis J, Qi L, Segrè AV, van Hoek M, Navarro P, Ardlie K, Balkau B, Benediktsson R, Bennett AJ, Blagieva R, Boerwinkle E, Bonnycastle LL, Bengtsson Boström K, Bravenboer B, Bumpstead S, Burtt NP, Charpentier G, Chines PS, Cornelis M, Couper DJ, Crawford G, Doney AS, Elliott KS, Elliott AL, Erdos MR, Fox CS, Franklin CS, Ganser M, Gieger C, Grarup N, Green T, Griffin S, Groves CJ, Guiducci C, Hadjadj S, Hassanali N, Herder C, Isomaa B, Jackson AU, Johnson PR, Jørgensen T, Kao WH, Klopp N, Kong A, Kraft P, Kuusisto J, Lauritzen T, Li M, Lieverse A, Lindgren CM, Lyssenko V, Marre M, Meitinger T, Midthjell K, Morken MA, Narisu N, Nilsson P, Owen KR, Payne F, Perry JR, Petersen AK, Platou C, Proença C, Prokopenko I, Rathmann W, Rayner NW, Robertson NR, Rocheleau G, Roden M, Sampson MJ, Saxena R, Shields BM, Shrader P, Sigurdsson G, Sparsø T, Strassburger K, Stringham HM, Sun Q, Swift AJ, Thorand B, Tichet J, Tuomi T, van Dam RM, van Haeften TW, van Herpt T, van Vliet-Ostaptchouk JV, Walters GB, Weedon MN, Wijmenga C, Witteman J, Bergman RN, Cauchi S, Collins FS, Gloyn AL, Gyllensten U, Hansen T, Hide WA, Hitman GA, Hofman A, Hunter DJ, Hveem K, Laakso M, Mohlke KL, Morris AD, Palmer CN, Pramstaller PP, Rudan I, Sijbrands E, Stein LD, Tuomilehto J, Uitterlinden A, Walker M, Wareham NJ, Watanabe RM, Abecasis GR, Boehm BO, Campbell H, Daly MJ, Hattersley AT, Hu FB, Meigs JB, Pankow JS, Pedersen O, Wichmann HE, Barroso I, Florez JC, Frayling TM, Groop L, Sladek R, Thorsteinsdottir U, Wilson JF, Illig T, Froguel P, van Duijn CM, Stefansson K, Altshuler D, Boehnke M, McCarthy MI, MAGIC investigators and GIANT Consortium

    Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.

    By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P<5x10(-8). These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.

    Funded by: Chief Scientist Office: CZB/4/710; Department of Health: DHCS/07/07/008; Medical Research Council: G0600331, G0601261, G0700222, G0700222(81696), G0701863, MC_U106179471, MC_U106179474, MC_U127592696; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: 1 Z01 HG000024, U01HG004171, U01HG004399, U01HG004402; NHLBI NIH HHS: 1K99HL094535-01A1, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02-HL-6-4278, R01HL086694, R01HL087641, R01HL59367; NIAMS NIH HHS: 1K08AR055688, K08 AR055688, K08 AR055688-03; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193, DK073490, DK078616, DK58845, K23-DK65978, K24-DK080140, R01 DK029867, R01 DK072193; PHS HHS: HHSN268200625226C; Wellcome Trust: 064890, 072960, 075491, 076113, 077016, 079557, 081682, 083270, 086596, 088885, 090532

    Nature genetics 2010;42;7;579-89

  • Comparison of two DNA microarrays for detection of plasmid-mediated antimicrobial resistance and virulence factor genes in clinical isolates of Enterobacteriaceae and non-Enterobacteriaceae.

    Walsh F, Cooke NM, Smith SG, Moran GP, Cooke FJ, Ivens A, Wain J and Rogers TR

    Department of Clinical Microbiology, Sir Patrick Dun Translational Research Laboratory, School of Medicine, University of Dublin, Trinity College, St James's Hospital Campus, Dublin 8, Ireland. fiona1walsh@gmail.com

    A DNA microarray was developed to detect plasmid-mediated antimicrobial resistance (AR) and virulence factor (VF) genes in clinical isolates of Enterobacteriaceae and non-Enterobacteriaceae. The array was validated with the following bacterial species: Escherichiacoli (n=17); Klebsiellapneumoniae (n=3); Enterobacter spp. (n=6); Acinetobacter genospecies 3 (n=1); Acinetobacterbaumannii (n=1); Pseudomonasaeruginosa (n=2); and Stenotrophomonasmaltophilia (n=2). The AR gene profiles of these isolates were identified by polymerase chain reaction (PCR). The DNA microarray consisted of 155 and 133 AR and VF gene probes, respectively. Results were compared with the commercially available Identibac AMR-ve Array Tube. Hybridisation results indicated that there was excellent correlation between PCR and array results for AR and VF genes. Genes conferring resistance to each antibiotic class were identified by the DNA array. Unusual resistance genes were also identified, such as bla(SHV-5) in a bla(OXA-23)-positive carbapenem-resistant A. baumannii. The phylogenetic group of each E. coli isolate was verified by the array. These data demonstrate that it is possible to screen simultaneously for all important classes of mobile AR and VF genes in Enterobacteriaceae and non-Enterobacteriaceae whilst also assigning a correct phylogenetic group to E. coli isolates. Therefore, it is feasible to test clinical Gram-negative bacteria for all known AR genes and to provide important information regarding pathogenicity simultaneously.

    Funded by: Medical Research Council

    International journal of antimicrobial agents 2010;35;6;593-8

  • The genome of a songbird.

    Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, Searle S, White S, Vilella AJ, Fairley S, Heger A, Kong L, Ponting CP, Jarvis ED, Mello CV, Minx P, Lovell P, Velho TA, Ferris M, Balakrishnan CN, Sinha S, Blatti C, London SE, Li Y, Lin YC, George J, Sweedler J, Southey B, Gunaratne P, Watson M, Nam K, Backström N, Smeds L, Nabholz B, Itoh Y, Whitney O, Pfenning AR, Howard J, Völker M, Skinner BM, Griffin DK, Ye L, McLaren WM, Flicek P, Quesada V, Velasco G, Lopez-Otin C, Puente XS, Olender T, Lancet D, Smit AF, Hubley R, Konkel MK, Walker JA, Batzer MA, Gu W, Pollock DD, Chen L, Cheng Z, Eichler EE, Stapley J, Slate J, Ekblom R, Birkhead T, Burke T, Burt D, Scharff C, Adam I, Richard H, Sultan M, Soldatov A, Lehrach H, Edwards SV, Yang SP, Li X, Graves T, Fulton L, Nelson J, Chinwalla A, Hou S, Mardis ER and Wilson RK

    The Genome Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA. wwarren@watson.wustl.edu

    The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D013704/1, BB/D013704/2, BB/E010652/1, BB/F007590/1, BBE0175091, BBS/E/I/00001425; Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; NHGRI NIH HHS: R01 HG002939, U54 HG003079; NIDA NIH HHS: P30 DA018310; NIDCD NIH HHS: R01 DC007218; NIGMS NIH HHS: R01 GM059290, R01 GM085233, R01 GM59290; NINDS NIH HHS: R01 NS045264, R01NS051820

    Nature 2010;464;7289;757-62

  • Genetic variants influencing circulating lipid levels and risk of coronary artery disease.

    Waterworth DM, Ricketts SL, Song K, Chen L, Zhao JH, Ripatti S, Aulchenko YS, Zhang W, Yuan X, Lim N, Luan J, Ashford S, Wheeler E, Young EH, Hadley D, Thompson JR, Braund PS, Johnson T, Struchalin M, Surakka I, Luben R, Khaw KT, Rodwell SA, Loos RJ, Boekholdt SM, Inouye M, Deloukas P, Elliott P, Schlessinger D, Sanna S, Scuteri A, Jackson A, Mohlke KL, Tuomilehto J, Roberts R, Stewart A, Kesäniemi YA, Mahley RW, Grundy SM, Wellcome Trust Case Control Consortium, McArdle W, Cardon L, Waeber G, Vollenweider P, Chambers JC, Boehnke M, Abecasis GR, Salomaa V, Järvelin MR, Ruokonen A, Barroso I, Epstein SE, Hakonarson HH, Rader DJ, Reilly MP, Witteman JC, Hall AS, Samani NJ, Strachan DP, Barter P, van Duijn CM, Kooner JS, Peltonen L, Wareham NJ, McPherson R, Mooser V and Sandhu MS

    Genetics Division, GlaxoSmithKline R&D, King of Prussia, PA, USA.

    Objective: Genetic studies might provide new insights into the biological mechanisms underlying lipid metabolism and risk of CAD. We therefore conducted a genome-wide association study to identify novel genetic determinants of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides.

    Methods and results: We combined genome-wide association data from 8 studies, comprising up to 17 723 participants with information on circulating lipid concentrations. We did independent replication studies in up to 37 774 participants from 8 populations and also in a population of Indian Asian descent. We also assessed the association between single-nucleotide polymorphisms (SNPs) at lipid loci and risk of CAD in up to 9 633 cases and 38 684 controls. We identified 4 novel genetic loci that showed reproducible associations with lipids (probability values, 1.6×10(-8) to 3.1×10(-10)). These include a potentially functional SNP in the SLC39A8 gene for HDL-C, an SNP near the MYLIP/GMPR and PPP1R3B genes for LDL-C, and at the AFF1 gene for triglycerides. SNPs showing strong statistical association with 1 or more lipid traits at the CELSR2, APOB, APOE-C1-C4-C2 cluster, LPL, ZNF259-APOA5-A4-C3-A1 cluster and TRIB1 loci were also associated with CAD risk (probability values, 1.1×10(-3) to 1.2×10(-9)).

    Conclusions: We have identified 4 novel loci associated with circulating lipids. We also show that in addition to those that are largely associated with LDL-C, genetic loci mainly associated with circulating triglycerides and HDL-C are also associated with risk of CAD. These findings potentially provide new insights into the biological mechanisms underlying lipid metabolism and CAD risk.

    Funded by: British Heart Foundation: PG/08/094, PG/08/094/26019; Medical Research Council: G0000934, G0401527, G0500539, G0601966, G0700931, G0701863, G0801056B, G0801566, MC_QA137934, MC_U105630924, MC_U106179471, MC_U106188470; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: R01 DK062370, U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 068545/Z/02, 077016/Z/05/Z, 079895, GR069224

    Arteriosclerosis, thrombosis, and vascular biology 2010;30;11;2264-76

  • Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

    Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E, Holmes C, Marchini JL, Stirrups K, Tobin MD, Wain LV, Yau C, Aerts J, Ahmad T, Andrews TD, Arbury H, Attwood A, Auton A, Ball SG, Balmforth AJ, Barrett JC, Barroso I, Barton A, Bennett AJ, Bhaskar S, Blaszczyk K, Bowes J, Brand OJ, Braund PS, Bredin F, Breen G, Brown MJ, Bruce IN, Bull J, Burren OS, Burton J, Byrnes J, Caesar S, Clee CM, Coffey AJ, Connell JM, Cooper JD, Dominiczak AF, Downes K, Drummond HE, Dudakia D, Dunham A, Ebbs B, Eccles D, Edkins S, Edwards C, Elliot A, Emery P, Evans DM, Evans G, Eyre S, Farmer A, Ferrier IN, Feuk L, Fitzgerald T, Flynn E, Forbes A, Forty L, Franklyn JA, Freathy RM, Gibbs P, Gilbert P, Gokumen O, Gordon-Smith K, Gray E, Green E, Groves CJ, Grozeva D, Gwilliam R, Hall A, Hammond N, Hardy M, Harrison P, Hassanali N, Hebaishi H, Hines S, Hinks A, Hitman GA, Hocking L, Howard E, Howard P, Howson JM, Hughes D, Hunt S, Isaacs JD, Jain M, Jewell DP, Johnson T, Jolley JD, Jones IR, Jones LA, Kirov G, Langford CF, Lango-Allen H, Lathrop GM, Lee J, Lee KL, Lees C, Lewis K, Lindgren CM, Maisuria-Armer M, Maller J, Mansfield J, Martin P, Massey DC, McArdle WL, McGuffin P, McLay KE, Mentzer A, Mimmack ML, Morgan AE, Morris AP, Mowat C, Myers S, Newman W, Nimmo ER, O'Donovan MC, Onipinla A, Onyiah I, Ovington NR, Owen MJ, Palin K, Parnell K, Pernet D, Perry JR, Phillips A, Pinto D, Prescott NJ, Prokopenko I, Quail MA, Rafelt S, Rayner NW, Redon R, Reid DM, Renwick, Ring SM, Robertson N, Russell E, St Clair D, Sambrook JG, Sanderson JD, Schuilenburg H, Scott CE, Scott R, Seal S, Shaw-Hawkins S, Shields BM, Simmonds MJ, Smyth DJ, Somaskantharajah E, Spanova K, Steer S, Stephens J, Stevens HE, Stone MA, Su Z, Symmons DP, Thompson JR, Thomson W, Travers ME, Turnbull C, Valsesia A, Walker M, Walker NM, Wallace C, Warren-Perry M, Watkins NA, Webster J, Weedon MN, Wilson AG, Woodburn M, Wordsworth BP, Young AH, Zeggini E, Carter NP, Frayling TM, Lee C, McVean G, Munroe PB, Palotie A, Sawcer SJ, Scherer SW, Strachan DP, Tyler-Smith C, Brown MA, Burton PR, Caulfield MJ, Compston A, Farrall M, Gough SC, Hall AS, Hattersley AT, Hill AV, Mathew CG, Pembrey M, Satsangi J, Stratton MR, Worthington J, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand W, Parkes M, Rahman N, Todd JA, Samani NJ and Donnelly P

    Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.

    Funded by: Arthritis Research UK: 17552; British Heart Foundation: RG/09/012/28096; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0400874, G0500115, G0501942, G0600329, G0600705, G0700491, G0701003, G0701420, G0701810, G0701810(85517), G0800383, G0800509, G0800759, G0801418B, G19/9, G90/106, G9521010, G9817803B, MC_UP_A390_1107; Wellcome Trust: 061858, 083948, 089989, 090532

    Nature 2010;464;7289;713-20

  • Distinct variants at LIN28B influence growth in height from birth to adulthood.

    Widén E, Ripatti S, Cousminer DL, Surakka I, Lappalainen T, Järvelin MR, Eriksson JG, Raitakari O, Salomaa V, Sovio U, Hartikainen AL, Pouta A, McCarthy MI, Osmond C, Kajantie E, Lehtimäki T, Viikari J, Kähönen M, Tyler-Smith C, Freimer N, Hirschhorn JN, Peltonen L and Palotie A

    Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland. elisabeth.widen@helsinki.fi

    We have studied the largely unknown genetic underpinnings of height growth by using a unique resource of longitudinal childhood height data available in Finnish population cohorts. After applying GWAS mapping of potential genes influencing pubertal height growth followed by further characterization of the genetic effects on complete postnatal growth trajectories, we have identified strong association between variants near LIN28B and pubertal growth (rs7759938; female p = 4.0 x 10(-9), male p = 1.5 x 10(-4), combined p = 5.0 x 10(-11), n = 5038). Analysis of growth during early puberty confirmed an effect on the timing of the growth spurt. Correlated SNPs have previously been implicated as influencing both adult stature and age at menarche, the same alleles associating with taller height and later age of menarche in other studies as with later pubertal growth here. Additionally, a partially correlated LIN28B SNP, rs314277, has been associated previously with final height. Testing both rs7759938 and rs314277 (pairwise r(2) = 0.29) for independent effects on postnatal growth in 8903 subjects indicated that the pubertal timing-associated marker rs7759938 affects prepubertal growth in females (p = 7 x 10(-5)) and final height in males (p = 5 x 10(-4)), whereas rs314277 has sex-specific effects on growth (p for interaction = 0.005) that were distinct from those observed at rs7759938. In conclusion, partially correlated variants at LIN28B tag distinctive, complex, and sex-specific height-growth-regulating effects, influencing the entire period of postnatal growth. These findings imply a critical role for LIN28B in the regulation of human growth.

    Funded by: Medical Research Council: G0500539; Wellcome Trust: 89061/Z/09/Z, WT089062

    American journal of human genetics 2010;86;5;773-82

  • The activating mutation R201C in GNAS promotes intestinal tumourigenesis in Apc(Min/+) mice through activation of Wnt and ERK1/2 MAPK pathways.

    Wilson CH, McIntyre RE, Arends MJ and Adams DJ

    Experimental Cancer Genetics, Wellcome Trust, Sanger Institute, Hinxton, UK.

    Somatically acquired, activating mutations of GNAS, the gene encoding the stimulatory G-protein Gsalpha subunit, have been identified in kidney, thyroid, pituitary, leydig cell, adrenocortical and, more recently, in colorectal tumours, suggesting that mutations such as R201C may be oncogenic in these tissues. To study the role of GNAS in intestinal tumourigenesis, we placed GNAS R201C under the control of the A33-antigen promoter (Gpa33), which is almost exclusively expressed in the intestines. The GNAS R201C mutation has been shown to result in the constitutive activation of Gsalpha and adenylate cyclase and to lead to the autonomous synthesis of cyclic adenosine monophosphate (cAMP). Gpa33(tm1(GnasR201C)Wtsi/+) mice showed significantly elevated cAMP levels and a compensatory upregulation of cAMP-specific phosphodiesterases in the intestinal epithelium. GNAS R201C alone was not sufficient to induce tumourigenesis by 12 months, but there was a significant increase in adenoma formation when Gpa33(tm1(GnasR201C)Wtsi/+) mice were bred onto an Apc(Min/+) background. GNAS R201C expression was associated with elevated expression of Wnt and extracellular signal-regulated kinase 1/2 mitogen-activated protein kinase (ERK1/2 MAPK) pathway target genes, increased phosphorylation of ERK1/2 MAPK and increased immunostaining for the proliferation marker Ki67. Furthermore, the effects of GNAS R201C on the Wnt pathway were additive to the inactivation of Apc. Our data strongly suggest that activating mutations of GNAS cooperate with inactivation of APC and are likely to contribute to colorectal tumourigenesis.

    Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356

    Oncogene 2010;29;32;4567-75

  • Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

    Wood HM, Belvedere O, Conway C, Daly C, Chalkley R, Bickerdike M, McKinley C, Egan P, Ross L, Hayward B, Morgan J, Davidson L, MacLennan K, Ong TK, Papagiannopoulos K, Cook I, Adams DJ, Taylor GR and Rabbitts P

    Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. h.m.wood@leeds.ac.uk

    The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.

    Funded by: Cancer Research UK; Wellcome Trust

    Nucleic acids research 2010;38;14;e151

  • Plasmodium falciparum ATP6 not under selection during introduction of artemisinin combination therapy in Peru.

    Woodrow CJ and Bustamante LY

    Antimicrobial agents and chemotherapy 2010;54;5;2280; author reply 2280-1

  • Commercially available outbred mice for genome-wide association studies.

    Yalcin B, Nicod J, Bhomra A, Davidson S, Cleak J, Farinelli L, Østerås M, Whitley A, Yuan W, Gan X, Goodson M, Klenerman P, Satpathy A, Mathis D, Benoist C, Adams DJ, Mott R and Flint J

    Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

    Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes.

    Funded by: Medical Research Council: G0800024; Wellcome Trust: 079912

    PLoS genetics 2010;6;9;e1001085

  • Racial/ethnic differences in association of fasting glucose-associated genomic loci with fasting glucose, HOMA-B, and impaired fasting glucose in the U.S. adult population.

    Yang Q, Liu T, Shrader P, Yesupriya A, Chang MH, Dowling NF, Ned RM, Dupuis J, Florez JC, Khoury MJ, Meigs JB and MAGIC Investigators

    Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, Georgia, USA. qay0@cdc.gov

    Objective: To estimate allele frequencies and the marginal and combined effects of novel fasting glucose (FG)-associated single nucleotide polymorphisms (SNPs) on FG levels and on risk of impaired FG (IFG) among non-Hispanic white, non-Hispanic black, and Mexican Americans.

    Research design and methods: DNA samples from 3,024 adult fasting participants in the National Health and Nutrition Examination Survey (NHANES) III (1991-1994) were genotyped for 16 novel FG-associated SNPs in multiple genes. We determined the allele frequencies and influence of these SNPs alone and in a weighted genetic risk score on FG, homeostasis model assessment of β-cell function (HOMA-B), and IFG by race/ethnicity, while adjusting for age and sex.

    Results: All allele frequencies varied significantly by race/ethnicity. A weighted genetic risk score, based on 16 SNPs, was associated with a 0.022 mmol/l (95% CI 0.009-0.035), 0.036 mmol/l (0.019-0.052), and 0.033 mmol/l (0.020-0.046) increase in FG levels per risk allele among non-Hispanic whites, non-Hispanic blacks, and Mexican Americans, respectively. Adjusted odds ratios for IFG were 1.78 for non-Hispanic whites (95% CI 1.00-3.17), 2.40 for non-Hispanic blacks (1.07-5.37), and 2.39 for Mexican Americans (1.37-4.14) when we compared the highest with the lowest quintiles of genetic risk score (P=0.365 for testing heterogeneity of effect across race/ethnicity).

    Conclusions: We conclude that allele frequencies of 16 novel FG-associated SNPs vary significantly by race/ethnicity, but the influence of these SNPs on FG levels, HOMA-B, and IFG were generally consistent across all racial/ethnic groups.

    Funded by: NIDDK NIH HHS: K23 DK65978, K24 DK080140, R01 DK078616

    Diabetes care 2010;33;11;2370-7

  • Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.

    Yang TP, Beazley C, Montgomery SB, Dimas AS, Gutierrez-Arcelus M, Stranger BE, Deloukas P and Dermitzakis ET

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.

    Unlabelled: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.

    Availability: http://www.sanger.ac.uk/resources/software/genevar.

    Funded by: Wellcome Trust

    Bioinformatics (Oxford, England) 2010;26;19;2474-6

  • Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

    Zhang ZD, Frankish A, Hunt T, Harrow J and Gerstein M

    Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. zdzmg@gersteinlab.org

    Background: Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.

    Results: We have developed a pipeline to detect human unitary pseudogenes through analyzing the global inventory of orthologs between the human genome and its mammalian relatives. We focus on gene losses along the human lineage after the divergence from rodents about 75 million years ago. In total, we identify 76 unitary pseudogenes, including previously annotated ones, and many novel ones. By comparing each of these to its functioning ortholog in other mammals, we can approximately date the creation of each unitary pseudogene (that is, the gene 'death date') and show that for our group of 76, the functional genes appear to be disabled at a fairly uniform rate throughout primate evolution - not all at once, correlated, for instance, with the 'Alu burst'. Furthermore, we identify 11 unitary pseudogenes that are polymorphic - that is, they have both nonfunctional and functional alleles currently segregating in the human population. Comparing them with their orthologs in other primates, we find that two of them are in fact pseudogenes in non-human primates, suggesting that they represent cases of a gene being resurrected in the human lineage.

    Conclusions: This analysis of unitary pseudogenes provides insights into the evolutionary constraints faced by different organisms and the timescales of functional gene loss in humans.

    Funded by: NHGRI NIH HHS: U54 HG004555; NLM NIH HHS: 1K99LM009770-01; Wellcome Trust: 077198

    Genome biology 2010;11;3;R26

* quick link - http://q.sanger.ac.uk/tqc9hh01