Sanger Institute - Publications 2010
Number of papers published in 2010: 215
A map of human genome variation from population-scale sequencing.
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Funded by: British Heart Foundation: RG/09/012/28096; Howard Hughes Medical Institute; Intramural NIH HHS; Medical Research Council: G0801823, G0801823(89305); NCRR NIH HHS: S10RR025056; NHGRI NIH HHS: 01HG3229, N01HG62088, P01 HG004120, P01HG4120, P41HG2371, P41HG4221, P41HG4222, P50HG2357, R01 HG003229, R01 HG003229-05, R01 HG004719, R01 HG004719-01, R01 HG004719-02, R01 HG004719-02S1, R01 HG004719-03, R01 HG004719-04, R01HG2651, R01HG3698, R01HG4333, R01HG4719, R01HG4960, RC2 HG005552, RC2 HG005552-01, RC2 HG005552-02, RC2HG5552, U01HG5208, U01HG5209, U01HG5210, U01HG5211, U01HG5214, U41HG4568, U54 HG003273, U54HG2750, U54HG2757, U54HG3067, U54HG3079, U54HG3273; NIGMS NIH HHS: R01GM59290, R01GM72861, T32 GM007753; NIMH NIH HHS: 01MH84698; Wellcome Trust: 075491, 077009, 077014, 077192, 081407, 085532, 086084, 089061, 089062, 089088, WT075491/Z/04, WT077009, WT081407/Z/06/Z, WT085532AIA, WT086084/Z/08/Z, WT089088/Z/09/Z
Genetic evidence of multiple loci in dystocia--difficult labour.
Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden. email@example.com
Background: Dystocia, difficult labour, is a common but also complex problem during childbirth. It can be attributed to either weak contractions of the uterus, a large infant, reduced capacity of the pelvis or combinations of these. Previous studies have indicated that there is a genetic component in the susceptibility of experiencing dystocia. The purpose of this study was to identify susceptibility genes in dystocia.
Methods: A total of 104 women in 47 families were included where at least two sisters had undergone caesarean section at a gestational length of 286 days or more at their first delivery. Study of medical records and a telephone interview was performed to identify subjects with dystocia. Whole-genome scanning using Affymetrix genotyping-arrays and non-parametric linkage (NPL) analysis was made in 39 women exhibiting the phenotype of dystocia from 19 families. In 68 women re-sequencing was performed of candidate genes showing suggestive linkage: oxytocin (OXT) on chromosome 20 and oxytocin-receptor (OXTR) on chromosome 3.
Results: We found a trend towards linkage with suggestive NPL-score (3.15) on chromosome 12p12. Suggestive linkage peaks were observed on chromosomes 3, 4, 6, 10, 20. Re-sequencing of OXT and OXTR did not reveal any causal variants.
Conclusions: Dystocia is likely to have a genetic component with variations in multiple genes affecting the patient outcome. We found 6 loci that could be re-evaluated in larger patient cohorts.
BMC medical genetics 2010;11;105
Data quality control in genetic case-control association studies
Nature Protocols. 2010;5;1564-73
Genome-wide association study of migraine implicates a common susceptibility variant on 8q22.1.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
Migraine is a common episodic neurological disorder, typically presenting with recurrent attacks of severe headache and autonomic dysfunction. Apart from rare monogenic subtypes, no genetic or molecular markers for migraine have been convincingly established. We identified the minor allele of rs1835740 on chromosome 8q22.1 to be associated with migraine (P = 5.38 × 10⁻⁹, odds ratio = 1.23, 95% CI 1.150-1.324) in a genome-wide association study of 2,731 migraine cases ascertained from three European headache clinics and 10,747 population-matched controls. The association was replicated in 3,202 cases and 40,062 controls for an overall meta-analysis P value of 1.69 × 10⁻¹¹ (odds ratio = 1.18, 95% CI 1.127-1.244). rs1835740 is located between MTDH (astrocyte elevated gene 1, also known as AEG-1) and PGCP (encoding plasma glutamate carboxypeptidase). In an expression quantitative trait study in lymphoblastoid cell lines, transcript levels of the MTDH were found to have a significant correlation to rs1835740 (P = 3.96 × 10⁻⁵, permuted threshold for genome-wide significance 7.7 × 10⁻⁵. To our knowledge, our data establish rs1835740 as the first genetic risk factor for migraine.
Funded by: Wellcome Trust: 089062, WT089062
Nature genetics 2010;42;10;869-73
Rare variant association analysis methods for complex traits.
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.
There has been increasing interest in rare variants and their association with disease, and several rare variant-disease associations have already been detected. The usual association tests for common variants are underpowered for detecting variants of lower frequency, so alternative approaches are required. In addition to reviewing the association analysis methods for rare variants, we discuss the limitations of genome-wide association studies in identifying rare variants and the problems that arise in the imputation of rare variants.
Funded by: Wellcome Trust: WT088885/Z/09/Z
Annual review of genetics 2010;44;293-308
A predominantly neolithic origin for European paternal lineages.
Department of Genetics, University of Leicester, Leicester, United Kingdom.
The relative contributions to modern European populations of Paleolithic hunter-gatherers and Neolithic farmers from the Near East have been intensely debated. Haplogroup R1b1b2 (R-M269) is the commonest European Y-chromosomal lineage, increasing in frequency from east to west, and carried by 110 million European men. Previous studies suggested a Paleolithic origin, but here we show that the geographical distribution of its microsatellite diversity is best explained by spread from a single source in the Near East via Anatolia during the Neolithic. Taken with evidence on the origins of other haplogroups, this indicates that most European Y chromosomes originate in the Neolithic expansion. This reinterpretation makes Europe a prime example of how technological and cultural change is linked with the expansion of a Y-chromosomal lineage, and the contrast of this pattern with that shown by maternally inherited mitochondrial DNA suggests a unique role for males in the transition.
Funded by: Wellcome Trust: 057559, 065569, 084060, 087576
PLoS biology 2010;8;1;e1000285
Curators of the world unite: the International Society of Biocuration.
Bioinformatics (Oxford, England) 2010;26;8;991
DUFs: families in search of function.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England. email@example.com
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.
Funded by: Wellcome Trust: 087656, WT077044/Z/05/Z
Acta crystallographica. Section F, Structural biology and crystallization communications 2010;66;Pt 10;1148-52
Signatures of adaptation to obligate biotrophy in the Hyaloperonospora arabidopsidis genome.
School of Life Sciences, Warwick University, Wellesbourne, CV35 9EF, UK.
Many oomycete and fungal plant pathogens are obligate biotrophs, which extract nutrients only from living plant tissue and cannot grow apart from their hosts. Although these pathogens cause substantial crop losses, little is known about the molecular basis or evolution of obligate biotrophy. Here, we report the genome sequence of the oomycete Hyaloperonospora arabidopsidis (Hpa), an obligate biotroph and natural pathogen of Arabidopsis thaliana. In comparison with genomes of related, hemibiotrophic Phytophthora species, the Hpa genome exhibits dramatic reductions in genes encoding (i) RXLR effectors and other secreted pathogenicity proteins, (ii) enzymes for assimilation of inorganic nitrogen and sulfur, and (iii) proteins associated with zoospore formation and motility. These attributes comprise a genomic signature of evolution toward obligate biotrophy.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C509123/1, BB/E024815/1, BB/E024882/1, BB/F0161901, EP/F500025/1, T12144; Wellcome Trust
Science (New York, N.Y.) 2010;330;6010;1549-51
Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06.
Department of Zoology, University of Oxford, UK. firstname.lastname@example.org
Background: The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally.
Results: Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent.
Conclusion: The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed.
Funded by: Wellcome Trust: 087622
BMC genomics 2010;11;652
Variants in ACAD10 are associated with type 2 diabetes, insulin resistance and lipid oxidation in Pima Indians.
Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 445 N. 5th Street, Suite 210, Phoenix, AZ 85004, USA.
Aims/hypothesis: A prior genome-wide association study in Pima Indians identified a variant within the ACAD10 gene that is associated with early-onset type 2 diabetes. Acylcoenzyme A dehydrogenase 10 (ACAD10) catalyses mitochondrial fatty acid beta-oxidation, which plays a pivotal role in developing insulin resistance and type 2 diabetes. Therefore, ACAD10 was analysed as a positional and biological candidate for type 2 diabetes.
Methods: Twenty-three SNPs were genotyped in 1,500 Pima Indians to determine the linkage disequilibrium pattern across ACAD10. Association with type 2 diabetes was determined by genotyping four tag single nucleotide polymorphisms (SNPs) in a population-based sample of 3,501 full-heritage Pima Indians; two associated SNPs were further genotyped in a second population-based sample of 3,723 American Indians. Associations with quantitative traits were assessed in 415 non-diabetic full heritage Pima individuals who had been metabolically phenotyped.
Results: SNPs rs601663 and rs659964 were associated with type 2 diabetes in the full-heritage Pima Indian sample (p=0.04 and 0.0006, respectively), and rs659964 was further associated with type 2 diabetes in the second American Indian sample (p=0.04). Combination of these two samples provided the strongest evidence for association (p=0.009 and 0.00007, for rs601663 and rs659964, respectively). Quantitative trait analyses identified nominal associations with both lower lipid oxidation rate and larger subcutaneous abdominal adipocyte size, which is consistent with the known physiology of ACAD10, and also identified associations with increased insulin resistance.
Conclusions/interpretation: We propose that ACAD10 variation may increase type 2 diabetes susceptibility by impairing insulin sensitivity via abnormal lipid oxidation.
Funded by: Intramural NIH HHS: ZIA DK075012-04
Signatures of mutation and selection in the cancer genome.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
The cancer genome is moulded by the dual processes of somatic mutation and selection. Homozygous deletions in cancer genomes occur over recessive cancer genes, where they can confer selective growth advantage, and over fragile sites, where they are thought to reflect an increased local rate of DNA breakage. However, most homozygous deletions in cancer genomes are unexplained. Here we identified 2,428 somatic homozygous deletions in 746 cancer cell lines. These overlie 11% of protein-coding genes that, therefore, are not mandatory for survival of human cells. We derived structural signatures that distinguish between homozygous deletions over recessive cancer genes and fragile sites. Application to clusters of unexplained homozygous deletions suggests that many are in regions of inherent fragility, whereas a small subset overlies recessive cancer genes. The results illustrate how structural signatures can be used to distinguish between the influences of mutation and selection in cancer genomes. The extensive copy number, genotyping, sequence and expression data available for this large series of publicly available cancer cell lines renders them informative reagents for future studies of cancer biology and drug discovery.
Funded by: NCI NIH HHS: P01 CA155258; Wellcome Trust: 077012/Z/05/Z, 088340, 093867
Large, rare chromosomal deletions associated with severe early-onset obesity.
University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
Obesity is a highly heritable and genetically heterogeneous disorder. Here we investigated the contribution of copy number variation to obesity in 300 Caucasian patients with severe early-onset obesity, 143 of whom also had developmental delay. Large (>500 kilobases), rare (<1%) deletions were significantly enriched in patients compared to 7,366 controls (P < 0.001). We identified several rare copy number variants that were recurrent in patients but absent or at much lower prevalence in controls. We identified five patients with overlapping deletions on chromosome 16p11.2 that were found in 2 out of 7,366 controls (P < 5 x 10(-5)). In three patients the deletion co-segregated with severe obesity. Two patients harboured a larger de novo 16p11.2 deletion, extending through a 593-kilobase region previously associated with autism and mental retardation; both of these patients had mild developmental delay in addition to severe obesity. In an independent sample of 1,062 patients with severe obesity alone, the smaller 16p11.2 deletion was found in an additional two patients. All 16p11.2 deletions encompass several genes but include SH2B1, which is known to be involved in leptin and insulin signalling. Deletion carriers exhibited hyperphagia and severe insulin resistance disproportionate for the degree of obesity. We show that copy number variation contributes significantly to the genetic architecture of human obesity.
Funded by: Medical Research Council: G0900554; Wellcome Trust: 077014, 077014/Z/05/0Z, 082390, 082390/Z/07/Z), 085475
Variants at DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci are associated with reduced glucose-stimulated beta cell function in middle-aged Danish people.
Hagedorn Research Institute, Niels Steensens Vej 2, 2820 Gentofte, Denmark.
Aims/hypothesis: A meta-analysis of 21 genome-wide association studies identified 11 novel genetic loci implicated in fasting glucose homeostasis. We aimed to evaluate the impact of these variants on insulin release and insulin sensitivity estimated from OGTTs.
Methods: Eleven variants in or near DGKB/TMEM195, ADCY5, MADD, ADRA2A, FADS1, CRY2, SLC2A2, GLIS3, PROX1, C2CD4B and IGF1 were genotyped in 6,784 middle-aged participants of the population-based Inter99 cohort. Association studies of quantitative estimates of insulin release and insulin sensitivity were performed in 5,722 non-diabetic Danish participants on whom an OGTT was performed.
Results: Assuming an additive genetic model, carriers of the alleles increasing fasting glucose in DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B showed decreased glucose-stimulated insulin release as assessed by the BIGTT-acute insulin response index (2.7-3.5%; p < 0.005 for all) and by corrected insulin response (2.8-5.9%; p < 0.03 for all). In addition, the PROX1 glucose-raising allele showed a 2.9% decreased corrected insulin response (p = 0.03), while the hyperglycaemic allele of variants in or near ADRA2A, FADS1, CRY2 and C2CD4B were associated with a 2.6% to 9.3% decrease in one or both of two different OGTT-based disposition indices (p < 0.02 for all). After correction for multiple testing, variants in the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with estimates of beta cell function.
Conclusions/interpretation: We found that the lead variants at the DGKB/TMEM195, ADRA2A, GLIS3 and C2CD4B loci were associated with decreased glucose-stimulated insulin response. This association underlines the importance of pancreatic beta cell dysfunction in the genetic predisposition to hyperglycaemia and type 2 diabetes.
Large-scale association analysis of TNF/LTA gene region polymorphisms in type 2 diabetes.
Department of Medical Biology, University of Split School of Medicine, Split, Croatia. email@example.com
Background: The TNF/LTA locus has been a long-standing T2D candidate gene. Several studies have examined association of TNF/LTA SNPs with T2D but the majority have been small-scale and produced no convincing evidence of association. The purpose of this study is to examine T2D association of tag SNPs in the TNF/LTA region capturing the majority of common variation in a large-scale sample set of UK/Irish origin.
Methods: This study comprised a case-control (1520 cases and 2570 control samples) and a family-based component (423 parent-offspring trios). Eleven tag SNPs (rs928815, rs909253, rs746868, rs1041981 (T60N), rs1800750, rs1800629 (G-308A), rs361525 (G-238A), rs3093662, rs3093664, rs3093665, and rs3093668) were selected across the TNF/LTA locus and genotyped using a fluorescence-based competitive allele specific assay. Quality control of the obtained genotypes was performed prior to single- and multi-point association analyses under the additive model.
Results: We did not find any consistent SNP associations with T2D in the case-control or family-based datasets.
Conclusions: The present study, designed to analyse a set of tag SNPs specifically selected to capture the majority of common variation in the TNF/LTA gene region, found no robust evidence for association with T2D. To investigate the presence of smaller effects of TNF/LTA gene variation with T2D, a large-scale meta-analysis will be required.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113, WT088885/Z/09/Z
BMC medical genetics 2010;11;69
53BP1 loss rescues BRCA1 deficiency and is associated with triple-negative and BRCA-mutated breast cancers.
Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
Germ-line mutations in breast cancer 1, early onset (BRCA1) result in predisposition to breast and ovarian cancer. BRCA1-mutated tumors show genomic instability, mainly as a consequence of impaired recombinatorial DNA repair. Here we identify p53-binding protein 1 (53BP1) as an essential factor for sustaining the growth arrest induced by Brca1 deletion. Depletion of 53BP1 abrogates the ATM-dependent checkpoint response and G2 cell-cycle arrest triggered by the accumulation of DNA breaks in Brca1-deleted cells. This effect of 53BP1 is specific to BRCA1 function, as 53BP1 depletion did not alleviate proliferation arrest or checkpoint responses in Brca2-deleted cells. Notably, loss of 53BP1 partially restores the homologous-recombination defect of Brca1-deleted cells and reverts their hypersensitivity to DNA-damaging agents. We find reduced 53BP1 expression in subsets of sporadic triple-negative and BRCA-associated breast cancers, indicating the potential clinical implications of our findings.
Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356
Nature structural & molecular biology 2010;17;6;688-95
Rare variation at the TNFAIP3 locus and susceptibility to rheumatoid arthritis.
Arthritis Research UK, Epidemiology Unit, University of Manchester, Manchester, UK.
Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤ 0.05) into a single count coupled with univariate analysis. We then prioritized gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10(-6)). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.
Funded by: Arthritis Research UK: 17552, 18475; Wellcome Trust: 064890, 081682
Human genetics 2010;128;6;627-33
Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop
Scoring and validation of tandem MS peptide identification methods.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.
Methods in molecular biology (Clifton, N.J.) 2010;604;43-53
Quantifying the mechanisms of domain gain in animal proteins.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
Background: Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms.
Results: Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events.
Conclusions: The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Genome biology 2010;11;7;R74
The patterns and dynamics of genomic instability in metastatic pancreatic cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97-98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites, and how the tumour disseminates. Here we harness advances in DNA sequencing to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2-M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.
Funded by: NCI NIH HHS: CA106610, CA140599, K08 CA106610, K08 CA106610-03, K08 CA106610-04, K08 CA106610-05, R01 CA140599, R01 CA140599-01, R01 CA140599-02, R01 CA140599-03; Wellcome Trust: 077012/Z/05/Z, 088340, 093867, WT088340MA
Beyond the Genome: genomics research ten years after the human genome sequence.
Department of Genetics, Stanford University, Stanford, CA 94305, USA. email@example.com
A report on the meeting 'Beyond the Genome', Boston, USA, 11-13 October 2010.
Genome biology 2010;11;11;309
Molecular and physiological analysis of three Pseudomonas aeruginosa phages belonging to the "N4-like viruses".
Division of Gene Technology, Katholieke Universiteit Leuven, Kasteelpark Arenberg, Leuven, B-3001, Belgium.
We present a detailed analysis of the genome architecture, structural proteome and infection-related properties of three Pseudomonas phages, designated LUZ7, LIT1 and PEV2. These podoviruses encapsulate 72.5 to 74.9 kb genomes and lyse their host after 25 min aerobic infection. PEV2 can successfully infect under anaerobic conditions, but its latent period is tripled, the lysis proceeds far slower and the burst size decreases significantly. While the overall genome structure of these phages resembles the well-studied coliphage N4, these Pseudomonas phages encode a cluster of tail genes which displays significant similarity to a Pseudomonasaeruginosa (cryptic) prophage region. Using ESI-MS/MS, these tail proteins were shown to be part of the phage particle, as well as ten other proteins including a giant 370 kDa virion RNA polymerase. These phages are the first described representatives of a novel kind of obligatory lytic P. aeruginosa-infecting phages, belonging to the widespread "N4-like viruses" genus.
Funded by: Wellcome Trust
Genetic loci influencing kidney function and chronic kidney disease.
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College of London, London, UK. firstname.lastname@example.org
Using genome-wide association, we identify common variants at 2p12-p13, 6q26, 17q23 and 19q13 associated with serum creatinine, a marker of kidney function (P = 10(-10) to 10(-15)). Of these, rs10206899 (near NAT8, 2p12-p13) and rs4805834 (near SLC7A9, 19q13) were also associated with chronic kidney disease (P = 5.0 x 10(-5) and P = 3.6 x 10(-4), respectively). Our findings provide insight into metabolic, solute and drug-transport pathways underlying susceptibility to chronic kidney disease.
Nature genetics 2010;42;5;373-5
The impact of gene expression regulation on evolution of extracellular signaling pathways.
Medical Research Council Laboratory of Molecular Biology, Cambridge CB20QH, United Kingdom. email@example.com
Extracellular protein interactions are crucial to the development of multicellular organisms because they initiate signaling pathways and enable cellular recognition cues. Despite their importance, extracellular protein interactions are often under-represented in large scale protein interaction data sets because most high throughput assays are not designed to detect low affinity extracellular interactions. Due to the lack of a comprehensive data set, the evolution of extracellular signaling pathways has remained largely a mystery. We investigated this question using a combined data set of physical pairwise interactions between zebrafish extracellular proteins, mainly from the immunoglobulin superfamily and leucine-rich repeat families, and their spatiotemporal expression profiles. We took advantage of known homology between proteins to estimate the relative rates of changes of four parameters after gene duplication, namely extracellular protein interaction, expression pattern, and the divergence of extracellular and intracellular protein sequences. We showed that change in expression profile is a major contributor to the evolution of signaling pathways followed by divergence in intracellular protein sequence, whereas extracellular sequence and interaction profiles were relatively more conserved. Rapidly evolving expression profiles will eventually drive other parameters to diverge more quickly because differentially expressed proteins get exposed to different environments and potential binding partners. This allows homologous extracellular receptors to attain specialized functions and become specific to tissues and/or developmental stages.
Funded by: Medical Research Council: MC_U105161047; Wellcome Trust: 077108/Z/05/Z
Molecular & cellular proteomics : MCP 2010;9;12;2666-77
Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042.
Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.
Background: Escherichia coli can experience a multifaceted life, in some cases acting as a commensal while in other cases causing intestinal and/or extraintestinal disease. Several studies suggest enteroaggregative E. coli are the predominant cause of E. coli-mediated diarrhea in the developed world and are second only to Campylobacter sp. as a cause of bacterial-mediated diarrhea. Furthermore, enteroaggregative E. coli are a predominant cause of persistent diarrhea in the developing world where infection has been associated with malnourishment and growth retardation.
Methods: In this study we determined the complete genomic sequence of E. coli 042, the prototypical member of the enteroaggregative E. coli, which has been shown to cause disease in volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains revealing previously uncharacterised virulence factors including a variety of secreted proteins and a capsular polysaccharide biosynthetic locus. In addition, by using Biolog Phenotype Microarrays we have provided a full metabolic profiling of E. coli 042 and the non-pathogenic lab strain E. coli K-12. We have highlighted the genetic basis for many of the metabolic differences between E. coli 042 and E. coli K-12.
Conclusion: This study provides a genetic context for the vast amount of experimental and epidemiological data published thus far and provides a template for future diagnostic and intervention strategies.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C510075/1; Medical Research Council: G0801209
PloS one 2010;5;1;e8801
Ensembl variation resources.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Background: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.
Description: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.
Conclusions: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.
Funded by: Medical Research Council; Wellcome Trust
BMC genomics 2010;11;293
Common variants near TERC are associated with mean telomere length.
Department of Cardiovascular Sciences, University of Leicester, Glenfield Hospital, Leicester, UK.
We conducted genome-wide association analyses of mean leukocyte telomere length in 2,917 individuals, with follow-up replication in 9,492 individuals. We identified an association with telomere length on 3q26 (rs12696304, combined P = 3.72 x 10(-14)) at a locus that includes TERC, which encodes the telomerase RNA component. Each copy of the minor allele of rs12696304 was associated with an approximately 75-base-pair reduction in mean telomere length, equivalent to approximately 3.6 years of age-related telomere-length attrition.
Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust
Nature genetics 2010;42;3;197-9
The dopamine β-hydroxylase -1021C/T polymorphism is associated with the risk of Alzheimer's disease in the Epistasis Project.
Neurology Service and Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas, Marqués de Valdecilla University Hospital (University of Cantabria), 39008 Santander, Spain. firstname.lastname@example.org
Background: The loss of noradrenergic neurones of the locus coeruleus is a major feature of Alzheimer's disease (AD). Dopamine β-hydroxylase (DBH) catalyses the conversion of dopamine to noradrenaline. Interactions have been reported between the low-activity -1021T allele (rs1611115) of DBH and polymorphisms of the pro-inflammatory cytokine genes, IL1A and IL6, contributing to the risk of AD. We therefore examined the associations with AD of the DBH -1021T allele and of the above interactions in the Epistasis Project, with 1757 cases of AD and 6294 elderly controls.
Methods: We genotyped eight single nucleotide polymorphisms (SNPs) in the three genes, DBH, IL1A and IL6. We used logistic regression models and synergy factor analysis to examine potential interactions and associations with AD.
Results: We found that the presence of the -1021T allele was associated with AD: odds ratio = 1.2 (95% confidence interval: 1.06-1.4, p = 0.005). This association was nearly restricted to men < 75 years old: odds ratio = 2.2 (1.4-3.3, 0.0004). We also found an interaction between the presence of DBH -1021T and the -889TT genotype (rs1800587) of IL1A: synergy factor = 1.9 (1.2-3.1, 0.005). All these results were consistent between North Europe and North Spain.
Conclusions: Extensive, previous evidence (reviewed here) indicates an important role for noradrenaline in the control of inflammation in the brain. Thus, the -1021T allele with presumed low activity may be associated with misregulation of inflammation, which could contribute to the onset of AD. We suggest that such misregulation is the predominant mechanism of the association we report here.
Funded by: Medical Research Council: G0400546
BMC medical genetics 2010;11;162
Mutation spectrum revealed by breakpoint sequencing of human germline CNVs.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 10% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.
Funded by: Wellcome Trust: 077014, 077014/Z/05/Z
Nature genetics 2010;42;5;385-91
Origins and functional impact of copy number variation in the human genome.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK.
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Funded by: Canadian Institutes of Health Research; NHGRI NIH HHS: HG004221; NIGMS NIH HHS: GM081533; Wellcome Trust: 077006/Z/05/Z, 077008, 077009, 077014, 088340
Strong genetic evidence for a selective influence of GABAA receptors on a component of the bipolar disorder phenotype.
Department of Psychological Medicine, School of Medicine, Cardiff University, Cardiff, UK. email@example.com
Despite compelling evidence for a major genetic contribution to risk of bipolar mood disorder, conclusive evidence implicating specific genes or pathophysiological systems has proved elusive. In part this is likely to be related to the unknown validity of current phenotype definitions and consequent aetiological heterogeneity of samples. In the recent Wellcome Trust Case Control Consortium genome-wide association analysis of bipolar disorder (1868 cases, 2938 controls) one of the most strongly associated polymorphisms lay within the gene encoding the GABA(A) receptor beta1 subunit, GABRB1. Aiming to increase biological homogeneity, we sought the diagnostic subset that showed the strongest signal at this polymorphism and used this to test for independent evidence of association with other members of the GABA(A) receptor gene family. The index signal was significantly enriched in the 279 cases meeting Research Diagnostic Criteria for schizoaffective disorder, bipolar type (P=3.8 x 10(-6)). Independently, these cases showed strong evidence that variation in GABA(A) receptor genes influences risk for this phenotype (independent system-wide P=6.6 x 10(-5)) with association signals also at GABRA4, GABRB3, GABRA5 and GABRR3. [corrected] Our findings have the potential to inform understanding of presentation, pathogenesis and nosology of bipolar disorders. Our method of phenotype refinement may be useful in studies of other complex psychiatric and non-psychiatric disorders.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 079643
Molecular psychiatry 2010;15;2;146-53
A rapid and scalable method for selecting recombinant mouse monoclonal antibodies.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, UK.
Background: Monoclonal antibodies with high affinity and selectivity that work on wholemount fixed tissues are valuable reagents to the cell and developmental biologist, and yet isolating them remains a long and unpredictable process. Here we report a rapid and scalable method to select and express recombinant mouse monoclonal antibodies that are essentially equivalent to those secreted by parental IgG-isotype hybridomas.
Results: Increased throughput was achieved by immunizing mice with pools of antigens and cloning - from small numbers of hybridoma cells - the functionally rearranged light and heavy chains into a single expression plasmid. By immunizing with the ectodomains of zebrafish cell surface receptor proteins expressed in mammalian cells and screening for formalin-resistant epitopes, we selected antibodies that gave expected staining patterns on wholemount fixed zebrafish embryos.
Conclusions: This method can be used to quickly select several high quality monoclonal antibodies from a single immunized mouse and facilitates their distribution using plasmids.
Funded by: NINDS NIH HHS: R01NS063400; Wellcome Trust: 077108/Z/05/Z
BMC biology 2010;8;76
A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407.
The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, United Kingdom.
In most cases, Escherichia coli exists as a harmless commensal organism, but it may on occasion cause intestinal and/or extraintestinal disease. Enterotoxigenic E. coli (ETEC) is the predominant cause of E. coli-mediated diarrhea in the developing world and is responsible for a significant portion of pediatric deaths. In this study, we determined the complete genomic sequence of E. coli H10407, a prototypical strain of enterotoxigenic E. coli, which reproducibly elicits diarrhea in human volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains, revealing that the chromosome is closely related to that of the nonpathogenic commensal strain E. coli HS and to those of the laboratory strains E. coli K-12 and C. Furthermore, these analyses demonstrated that there were no chromosomally encoded factors unique to any sequenced ETEC strains. Comparison of the E. coli H10407 plasmids with those from several ETEC strains revealed that the plasmids had a mosaic structure but that several loci were conserved among ETEC strains. This study provides a genetic context for the vast amount of experimental and epidemiological data that have been published.
Funded by: Biotechnology and Biological Sciences Research Council: BB/C510075/1; Medical Research Council: G0801209; Wellcome Trust
Journal of bacteriology 2010;192;21;5822-31
Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes.
Cancer Genome Project, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Clear cell renal cell carcinoma (ccRCC) is the most common form of adult kidney cancer, characterized by the presence of inactivating mutations in the VHL gene in most cases, and by infrequent somatic mutations in known cancer genes. To determine further the genetics of ccRCC, we have sequenced 101 cases through 3,544 protein-coding genes. Here we report the identification of inactivating mutations in two genes encoding enzymes involved in histone modification-SETD2, a histone H3 lysine 36 methyltransferase, and JARID1C (also known as KDM5C), a histone H3 lysine 4 demethylase-as well as mutations in the histone H3 lysine 27 demethylase, UTX (KMD6A), that we recently reported. The results highlight the role of mutations in components of the chromatin modification machinery in human cancer. Furthermore, NF2 mutations were found in non-VHL mutated ccRCC, and several other probable cancer genes were identified. These results indicate that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene, and that systematic screens will be key to fully determining the somatic genetic architecture of cancer.
Funded by: Wellcome Trust: 077012, 077012/Z/05/Z, 082359, 088340, 093867
Analysis of TBC1D4 in patients with severe insulin resistance.
Funded by: Medical Research Council: G0600414, G0800203, MC_U106179471, MC_U117588499; NIDDK NIH HHS: DK25336, R01 DK025336, R56 DK025336; Wellcome Trust: 072070, 077016, 088316
Leishmania-specific surface antigens show sub-genus sequence variation and immune recognition.
Centre for Immunology and Infection, Department of Biology, Hull York Medical School, University of York, York, United Kingdom.
Background: A family of hydrophilic acylated surface (HASP) proteins, containing extensive and variant amino acid repeats, is expressed at the plasma membrane in infective extracellular (metacyclic) and intracellular (amastigote) stages of Old World Leishmania species. While HASPs are antigenic in the host and can induce protective immune responses, the biological functions of these Leishmania-specific proteins remain unresolved. Previous genome analysis has suggested that parasites of the sub-genus Leishmania (Viannia) have lost HASP genes from their genomes.
We have used molecular and cellular methods to analyse HASP expression in New World Leishmania mexicana complex species and show that, unlike in L. major, these proteins are expressed predominantly following differentiation into amastigotes within macrophages. Further genome analysis has revealed that the L. (Viannia) species, L. (V.) braziliensis, does express HASP-like proteins of low amino acid similarity but with similar biochemical characteristics, from genes present on a region of chromosome 23 that is syntenic with the HASP/SHERP locus in Old World Leishmania species and the L. (L.) mexicana complex. A related gene is also present in Leptomonas seymouri and this may represent the ancestral copy of these Leishmania-genus specific sequences. The L. braziliensis HASP-like proteins (named the orthologous (o) HASPs) are predominantly expressed on the plasma membrane in amastigotes and are recognised by immune sera taken from 4 out of 6 leishmaniasis patients tested in an endemic region of Brazil. Analysis of the repetitive domains of the oHASPs has shown considerable genetic variation in parasite isolates taken from the same patients, suggesting that antigenic change may play a role in immune recognition of this protein family.
These findings confirm that antigenic hydrophilic acylated proteins are expressed from genes in the same chromosomal region in species across the genus Leishmania. These proteins are surface-exposed on amastigotes (although L. (L.) major parasites also express HASPB on the metacyclic plasma membrane). The central repetitive domains of the HASPs are highly variant in their amino acid sequences, both within and between species, consistent with a role in immune recognition in the host.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0900950, G9721629; Wellcome Trust: 048615, 076355, 077503
PLoS neglected tropical diseases 2010;4;9;e829
Ectodomains of the LDL receptor-related proteins LRP1b and LRP4 have anchorage independent functions in vivo.
Department of Molecular Genetics, UT Southwestern, Dallas, Texas, United States of America.
Background: The low-density lipoprotein (LDL) receptor gene family is a highly conserved group of membrane receptors with diverse functions in developmental processes, lipoprotein trafficking, and cell signaling. The low-density lipoprotein (LDL) receptor-related protein 1b (LRP1B) was reported to be deleted in several types of human malignancies, including non-small cell lung cancer. Our group has previously reported that a distal extracellular truncation of murine Lrp1b that is predicted to secrete the entire intact extracellular domain (ECD) is fully viable with no apparent phenotype.
Methods and principal findings: Here, we have used a gene targeting approach to create two mouse lines carrying internally rearranged exons of Lrp1b that are predicted to truncate the protein closer to the N-terminus and to prevent normal trafficking through the secretary pathway. Both mutations result in early embryonic lethality, but, as expected from the restricted expression pattern of LRP1b in vivo, loss of Lrp1b does not cause cellular lethality as homozygous Lrp1b-deficient blastocysts can be propagated normally in culture. This is similar to findings for another LDL receptor family member, Lrp4. We provide in vitro evidence that Lrp4 undergoes regulated intramembraneous processing through metalloproteases and gamma-secretase cleavage. We further demonstrate negative regulation of the Wnt signaling pathway by the soluble extracellular domain.
Conclusions and significance: Our results underline a crucial role for Lrp1b in development. The expression in mice of truncated alleles of Lrp1b and Lrp4 with deletions of the transmembrane and intracellular domains leads to release of the extracellular domain into the extracellular space, which is sufficient to confer viability. In contrast, null mutations are embryonically (Lrp1b) or perinatally (Lrp4) lethal. These findings suggest that the extracellular domains of both proteins may function as a scavenger for signaling ligands or signal modulators in the extracellular space, thereby preserving signaling thresholds that are critical for embryonic development, as well as for the clear, but poorly understood role of LRP1b in cancer.
Funded by: Cancer Research UK; NHLBI NIH HHS: R37 HL063762; Wellcome Trust
PloS one 2010;5;4;e9960
Multiple common variants for celiac disease influencing immune gene expression.
Blizard Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.
We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.
Funded by: Medical Research Council: G0700545, G0700545(82277); NIDDK NIH HHS: DK050678, DK071003, DK081645, DK57892, R01 DK081645; NINDS NIH HHS: NS058980; Wellcome Trust: 084743
Nature genetics 2010;42;4;295-302
New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.
Department of Biostatistics, Boston University School of Public Health, Massachusetts, USA.
Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.
Funded by: British Heart Foundation: RG/07/008/23674; Chief Scientist Office: CZB/4/710; Medical Research Council: G0100222, G0600331, G0600705, G0601261, G0700222, G0700222(81696), G0701863, G0801056, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128, MC_U127592696, MC_U137686857, MC_UP_A620_1014, MC_UP_A620_1015; NIDDK NIH HHS: K24 DK080140, P30 DK040561, P30 DK040561-14, R01 DK029867, R01 DK072193, R01 DK078616, R01 DK078616-01A1; The Dunhill Medical Trust: R69/0208; Wellcome Trust: 064890, 077011, 077016, 081682, 088885, 089061, 090532, 091746
Nature genetics 2010;42;2;105-16
Traces of sub-Saharan and Middle Eastern lineages in Indian Muslim populations.
National DNA Analysis Centre, Central Forensic Science Laboratory, Kolkata, India.
Islam is the second most practiced religion in India, next to Hinduism. It is still unclear whether the spread of Islam in India has been only a cultural transformation or is associated with detectable levels of gene flow. To estimate the contribution of West Asian and Arabian admixture to Indian Muslims, we assessed genetic variation in mtDNA, Y-chromosomal and LCT/MCM6 markers in 472, 431 and 476 samples, respectively, representing six Muslim communities from different geographical regions of India. We found that most of the Indian Muslim populations received their major genetic input from geographically close non-Muslim populations. However, low levels of likely sub-Saharan African, Arabian and West Asian admixture were also observed among Indian Muslims in the form of L0a2a2 mtDNA and E1b1b1a and J(*)(xJ2) Y-chromosomal lineages. The distinction between Iranian and Arabian sources was difficult to make with mtDNA and the Y chromosome, as the estimates were highly correlated because of similar gene pool compositions in the sources. In contrast, the LCT/MCM6 locus, which shows a clear distinction between the two sources, enabled us to rule out significant gene flow from Arabia. Overall, our results support a model according to which the spread of Islam in India was predominantly cultural conversion associated with minor but still detectable levels of gene flow from outside, primarily from Iran and Central Asia, rather than directly from the Arabian Peninsula.
Funded by: Wellcome Trust: 077009
European journal of human genetics : EJHG 2010;18;3;354-63
Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies.
Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.
To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P = 5.4 × 10⁻⁶⁰) and 9q31.2 (P = 2.2 × 10⁻³³), we identified 30 new menarche loci (all P < 5 × 10⁻⁸) and found suggestive evidence for a further 10 loci (P < 1.9 × 10⁻⁶). The new loci included four previously associated with body mass index (in or near FTO, SEC16B, TRA2B and TMEM18), three in or near other genes implicated in energy homeostasis (BSX, CRTC1 and MCHR2) and three in or near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and gene-set enrichment pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.
Funded by: Canadian Institutes of Health Research: 166067; Cancer Research UK: 10118, A10119, A10124; Chief Scientist Office: CZB/4/710; Intramural NIH HHS; Medical Research Council: G0000934, G0401527, G0500539, G0600705, G0701863, G9815508, MC_U106179471, MC_U106179472, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA089392, CA104021, CA136792, CA40356, CA54281, CA63464, CA98233, P01 CA055075, P01 CA055075-17, P01 CA087969, P01 CA087969-13, P01 CA089392, P01 CA089392-08, P01 CA089392-09, P01CA055075, P01CA087969, R01 CA040356-15S1, R01 CA047988, R01 CA047988-20, R01 CA063464, R01 CA063464-10, R01 CA104021-05, R37 CA054281, R37 CA054281-17, U01 CA098233, U01 CA098233-08, U01 CA136792, U01 CA136792-03, Z01 CP010200-03, Z01CP010200; NCRR NIH HHS: M01 RR 16500, M01 RR-00750, M01 RR000750-31, M01 RR016500-04, U54RR025204-01, UL1 RR025005, UL1 RR025005-05, UL1 RR025774, UL1 RR025774-05, UL1RR025005; NHGRI NIH HHS: U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, U01 HG004415, U01 HG004415-02, U01 HG004422, U01 HG004422-01, U01 HG004422-02, U01 HG004423, U01 HG004423-01, U01 HG004424-04, U01 HG004436, U01 HG004436-02, U01 HG004438, U01 HG004438-04, U01 HG004446, U01 HG004446-04, U01 HG004726, U01 HG004726-02, U01 HG004728, U01 HG004728-01, U01 HG004729, U01 HG004729-02, U01 HG004735, U01 HG004735-02, U01 HG004738, U01 HG004738-02, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004436, U01HG004438, U01HG004446, U01HG004728, U01HG004729, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL 043851, HL087679, HL69757, N01 HC025195, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02 HL64278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-11, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087679-03, R01 HL088119, R01 HL088119-04, R01HL086694, R01HL087641, R01HL59367, RC2 HL102419, RC2 HL102419-02, U01 HL072515, U01 HL072515-06, U01 HL084756, U01 HL084756-03, U01 HL84756, U01HL72515, U19 HL069757, U19 HL069757-11; NIA NIH HHS: AG-16592, N.1-AG-1-1, N.1-AG-1-2111, N01 AG012100, N01 AG012109, N01 AG050002, N01-AG-1-2109, N01-AG-12100, N01-AG-5-0002, P01 AG018397, P01 AG018397-08, P01 AG025204, P01 AG025204-03, P01-AG-18397, R01 AG016592, R01 AG016592-10, R01 AG041517, R01 AR/AG 41398, R21 AG032598, R21 AG032598-02, R21AG032598; NIAAA NIH HHS: AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, K05 AA017688, R01 AA007535, R01 AA007535-08, R01 AA013320, R01 AA013320-05, R01 AA013321, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05, U10 AA008401, U10 AA008401-23, U10AA008401; NIAMS NIH HHS: R01 AR041398, R01 AR041398-15, R01 AR041398-20; NICHD NIH HHS: HD-061437, R03 HD061437, R03 HD061437-02; NIDA NIH HHS: R01 DA012854, R01 DA012854-09, R01 DA013423, R01 DA013423-05, R01 DA019963, R01 DA019963-01A2, R01 DA019963-02, R01 DA019963-03, R01-DA013423; NIDCR NIH HHS: U01 DE018903, U01 DE018903-02, U01 DE018993, U01 DE018993-01, U01DE018903, U01DE018993; NIDDK NIH HHS: P30 DK072488, R01 DK058845, R01 DK058845-11, R01DK058845, U01 DK062418, U01 DK062418-06; NIMH NIH HHS: MH66206, R01 MH066206, R01 MH066206-05; NIMHD NIH HHS: 263 MD 821336, 263 MD 9164, 263 MD821336, 263 MD9164 13; PHS HHS: HHSN268200625226C, HHSN268200782096C, R01-088119, RFAHG006033; Wellcome Trust: 068545/Z/02, 076467/Z/05/Z, 077016/Z/05/Z, 079895, 89061/Z/09/Z
Nature genetics 2010;42;12;1077-85
A high-throughput pharmaceutical screen identifies compounds with specific toxicity against BRCA2-deficient tumors.
Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, the Netherlands.
Purpose: Hereditary breast cancer is partly explained by germline mutations in BRCA1 and BRCA2. Although patients carry heterozygous mutations, their tumors have typically lost the remaining wild-type allele. Selectively targeting BRCA deficiency may therefore constitute an important therapeutic approach. Clinical trials applying this principle are underway, but it is unknown whether the compounds tested are optimal. It is therefore important to identify alternative compounds that specifically target BRCA deficiency and to test new combination therapies to establish optimal treatment strategies.
Experimental design: We did a high-throughput pharmaceutical screen on BRCA2-deficient mouse mammary tumor cells and isogenic controls with restored BRCA2 function. Subsequently, we validated positive hits in vitro and in vivo using mice carrying BRCA2-deficient mammary tumors.
Results: Three alkylators-chlorambucil, melphalan, and nimustine-displayed strong and specific toxicity against BRCA2-deficient cells. In vivo, these showed heterogeneous but generally strong BRCA2-deficient antitumor activity, with melphalan and nimustine doing better than cisplatin and the poly-(ADP-ribose)-polymerase inhibitor olaparib (AZD2281) in this small study. In vitro drug combination experiments showed synergistic interactions between the alkylators and olaparib. Tumor intervention studies combining nimustine and olaparib resulted in recurrence-free survival exceeding 330 days in 3 of 5 animals tested.
Conclusions: We generated and validated a platform for identification of compounds with specific activity against BRCA2-deficient cells that translates well to the preclinical setting. Our data call for the re-evaluation of alkylators, especially melphalan and nimustine, alone or in combination with the poly-(ADP-ribose)-polymerase inhibitors, for the treatment of breast cancers with a defective BRCA pathway.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D012910/1; Cancer Research UK; Wellcome Trust
Clinical cancer research : an official journal of the American Association for Cancer Research 2010;16;1;99-108
The genetics of obesity: FTO leads the way.
Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
In 2007, an association of single nucleotide polymorphisms (SNPs) in the fat mass and obesity-associated (FTO) gene region with body mass index (BMI) and risk of obesity was identified in multiple populations, making FTO the first locus unequivocally associated with adiposity. At the time, FTO was a gene of unknown function and it was not known whether these SNPs exerted their effect on adiposity by affecting FTO or neighboring genes. Therefore, this breakthrough association inspired a wealth of in silico, in vitro, and in vivo analyses in model organisms and humans to improve knowledge of FTO function. These studies suggested that FTO plays a role in controlling feeding behavior and energy expenditure. Here, we review the approaches taken that provide a blueprint for the study of other obesity-associated genes in the hope that this strategy will result in increased understanding of the biological mechanisms underlying body weight regulation.
Funded by: Wellcome Trust: 077016/Z/05/Z
Trends in genetics : TIG 2010;26;6;266-74
Detailed investigation of the role of common and low-frequency WFS1 variants in type 2 diabetes risk.
Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
Objective: Wolfram syndrome 1 (WFS1) single nucleotide polymorphisms (SNPs) are associated with risk of type 2 diabetes. In this study we aimed to refine this association and investigate the role of low-frequency WFS1 variants in type 2 diabetes risk.
Research design and methods: For fine-mapping, we sequenced WFS1 exons, splice junctions, and conserved noncoding sequences in samples from 24 type 2 diabetic case and 68 control subjects, selected tagging SNPs, and genotyped these in 959 U.K. type 2 diabetic case and 1,386 control subjects. The same genomic regions were sequenced in samples from 1,235 type 2 diabetic case and 1,668 control subjects to compare the frequency of rarer variants between case and control subjects.
Results: Of 31 tagging SNPs, the strongest associated was the previously untested 3' untranslated region rs1046320 (P = 0.008); odds ratio 0.84 and P = 6.59 x 10(-7) on further replication in 3,753 case and 4,198 control subjects. High correlation between rs1046320 and the original strongest SNP (rs10010131) (r2 = 0.92) meant that we could not differentiate between their effects in our samples. There was no difference in the cumulative frequency of 82 rare (minor allele frequency [MAF] <0.01) nonsynonymous variants between type 2 diabetic case and control subjects (P = 0.79). Two intermediate frequency (MAF 0.01-0.05) nonsynonymous changes also showed no statistical association with type 2 diabetes.
Conclusions: We identified six highly correlated SNPs that show strong and comparable associations with risk of type 2 diabetes, but further refinement of these associations will require large sample sizes (>100,000) or studies in ethnically diverse populations. Low frequency variants in WFS1 are unlikely to have a large impact on type 2 diabetes risk in white U.K. populations, highlighting the complexities of undertaking association studies with low-frequency variants identified by resequencing.
Funded by: British Heart Foundation; Medical Research Council: MC_U106179471; Wellcome Trust: 064890, 077016, 077016/Z/05/Z, 081682
Characterization of a hotspot for mimicry: assembly of a butterfly wing transcriptome to genomic sequence at the HmYb/Sb locus.
Department of Zoology, University of Cambridge, UK.
The mimetic wing patterns of Heliconius butterflies are an excellent example of both adaptive radiation and convergent evolution. Alleles at the HmYb and HmSb loci control the presence/absence of hindwing bar and hindwing margin phenotypes respectively between divergent races of Heliconius melpomene, and also between sister species. Here, we used fine-scale linkage mapping to identify and sequence a BAC tilepath across the HmYb/Sb loci. We also generated transcriptome sequence data for two wing pattern forms of H. melpomene that differed in HmYb/Sb alleles using 454 sequencing technology. Custom scripts were used to process the sequence traces and generate transcriptome assemblies. Genomic sequence for the HmYb/Sb candidate region was annotated both using the MAKER pipeline and manually using transcriptome sequence reads. In total, 28 genes were identified in the HmYb/Sb candidate region, six of which have alternative splice forms. None of these are orthologues of genes previously identified as being expressed in butterfly wing pattern development, implying previously undescribed molecular mechanisms of pattern determination on Heliconius wings. The use of next-generation sequencing has therefore facilitated DNA annotation of a poorly characterized genome, and generated hypotheses regarding the identity of wing pattern at the HmYb/Sb loci.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E008836/1, BB/E011845/1, BB/G00661X/1; Medical Research Council: G0900740
Molecular ecology 2010;19 Suppl 1;240-54
The Pfam protein families database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; Wellcome Trust: 087656, WT077044/Z/05/Z
Nucleic acids research 2010;38;Database issue;D211-22
Ensembl's 10th year.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. email@example.com
Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BBE0116401, BBS/B/13438, BBS/B/13462; Wellcome Trust: 062023, 077198
Nucleic acids research 2010;38;Database issue;D557-62
Evaluating the discriminative power of multi-trait genetic risk scores for type 2 diabetes in a northern Swedish population.
Department of Nutrition Sciences, University of Ottawa, Ottawa, ON, Canada.
Aims/hypothesis: We determined whether single nucleotide polymorphisms (SNPs) previously associated with diabetogenic traits improve the discriminative power of a type 2 diabetes genetic risk score.
Methods: Participants (n = 2,751) were genotyped for 73 SNPs previously associated with type 2 diabetes, fasting glucose/insulin concentrations, obesity or lipid levels, from which five genetic risk scores (one for each of the four traits and one combining all SNPs) were computed. Type 2 diabetes patients and non-diabetic controls (n = 1,327/1,424) were identified using medical records in addition to an independent oral glucose tolerance test.
Results: Model 1, including only SNPs associated with type 2 diabetes, had a discriminative power of 0.591 (p < 1.00 x 10(-20) vs null model) as estimated by the area under the receiver operator characteristic curve (ROC AUC). Model 2, including only fasting glucose/insulin SNPs, had a significantly higher discriminative power than the null model (ROC AUC 0.543; p = 9.38 x 10(-6) vs null model), but lower discriminative power than model 1 (p = 5.92 x 10(-5)). Model 3, with only lipid-associated SNPs, had significantly higher discriminative power than the null model (ROC AUC 0.565; p = 1.44 x 10(-9)) and was not statistically different from model 1 (p = 0.083). The ROC AUC of model 4, which included only obesity SNPs, was 0.557 (p = 2.30 x 10(-7) vs null model) and smaller than model 1 (p = 0.025). Finally, the model including all SNPs yielded a significant improvement in discriminative power compared with the null model (p < 1.0 x 10(-20)) and model 1 (p = 1.32 x 10(-5)); its ROC AUC was 0.626.
Conclusions/interpretation: Adding SNPs previously associated with fasting glucose, insulin, lipids or obesity to a genetic risk score for type 2 diabetes significantly increases the power to discriminate between people with and without clinically manifest type 2 diabetes compared with a model including only conventional type 2 diabetes loci.
Funded by: Wellcome Trust: 077016/Z/05/Z
COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer.
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13,423 genes in almost 370,000 tumours, describing over 90,000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.
Funded by: Wellcome Trust: 077012/Z/05/Z
Nucleic acids research 2010;38;Database issue;D652-7
Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.
Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany.
We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.
Funded by: Chief Scientist Office: CZB/4/540, ETM/75; Medical Research Council: G0600329, G0800675, G0800759; NCRR NIH HHS: M01-RR00425; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-35129, N01-HC-45133, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, R01 HL087652, U01 HL080295; NIAMS NIH HHS: K08 AR055688, K08 AR055688-01A1S1, K08 AR055688-03, K08 AR055688-04; NIDDK NIH HHS: DK 063491, DK062413, DK062420, DK062422, DK062423, DK062429, DK062431, DK062432, DK064869, DK069513, DK084554, DK76984, P01 DK046763, P01-DK046763, P30 DK043351, R01 DK064869, U01 DK062420, U01 DK062431, U01 DK062432; Wellcome Trust: 089120, WT089120/Z/09/Z
Nature genetics 2010;42;12;1118-25
Nonobese diabetic congenic strain analysis of autoimmune diabetes reveals genetic complexity of the Idd18 locus and identifies Vav3 as a candidate gene.
Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge.
We have used the public sequencing and annotation of the mouse genome to delimit the previously resolved type 1 diabetes (T1D) insulin-dependent diabetes (Idd)18 interval to a region on chromosome 3 that includes the immunologically relevant candidate gene, Vav3. To test the candidacy of Vav3, we developed a novel congenic strain that enabled the resolution of Idd18 to a 604-kb interval, designated Idd18.1, which contains only two annotated genes: the complete sequence of Vav3 and the last exon of the gene encoding NETRIN G1, Ntng1. Targeted sequencing of Idd18.1 in the NOD mouse strain revealed that allelic variation between NOD and C57BL/6J (B6) occurs in noncoding regions with 138 single nucleotide polymorphisms concentrated in the introns between exons 20 and 27 and immediately after the 3' untranslated region. We observed differential expression of VAV3 RNA transcripts in thymocytes when comparing congenic mouse strains with B6 or NOD alleles at Idd18.1. The T1D protection associated with B6 alleles of Idd18.1/Vav3 requires the presence of B6 protective alleles at Idd3, which are correlated with increased IL-2 production and regulatory T cell function. In the absence of B6 protective alleles at Idd3, we detected a second T1D protective B6 locus, Idd18.3, which is closely linked to, but distinct from, Idd18.1. Therefore, genetic mapping, sequencing, and gene expression evidence indicate that alteration of VAV3 expression is an etiological factor in the development of autoimmune beta-cell destruction in NOD mice. This study also demonstrates that a congenic strain mapping approach can isolate closely linked susceptibility genes.
Funded by: NIAID NIH HHS: AI 15416; Wellcome Trust: 061858, 061859, 079895
Journal of immunology (Baltimore, Md. : 1950) 2010;184;9;5075-84
Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight.
Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter, UK.
To identify genetic variants associated with birth weight, we meta-analyzed six genome-wide association (GWA) studies (n = 10,623 Europeans from pregnancy/birth cohorts) and followed up two lead signals in 13 replication studies (n = 27,591). rs900400 near LEKR1 and CCNL1 (P = 2 x 10(-35)) and rs9883204 in ADCY5 (P = 7 x 10(-15)) were robustly associated with birth weight. Correlated SNPs in ADCY5 were recently implicated in regulation of glucose levels and susceptibility to type 2 diabetes, providing evidence that the well-described association between lower birth weight and subsequent type 2 diabetes has a genetic component, distinct from the proposed role of programming by maternal nutrition. Using data from both SNPs, we found that the 9% of Europeans carrying four birth weight-lowering alleles were, on average, 113 g (95% CI 89-137 g) lighter at birth than the 24% with zero or one alleles (P(trend) = 7 x 10(-30)). The impact on birth weight is similar to that of a mother smoking 4-5 cigarettes per day in the third trimester of pregnancy.
Funded by: British Heart Foundation; Canadian Institutes of Health Research: MOP 82893; Chief Scientist Office: CZB/4/710; Department of Health: PHCS/C4/4/016; FIC NIH HHS: TW05596; Medical Research Council: G0000934, G0500070, G0500539, G0600331, G0600705, G0601261, G0601653, G0800582, G0801056, G9815508; NCRR NIH HHS: RR20649; NHLBI NIH HHS: HL068041, HL085144, HL0876792; NICHD NIH HHS: HD034568, HD05450, HD056465, R01 HD056465, R24 HD050924; NIDDK NIH HHS: 1R01DK075787, DK075787, DK078150, DK56350; NIEHS NIH HHS: ES10126; NIMH NIH HHS: MH083268, MH63706; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 085301, 085541, 090532, 89061/Z/09/Z
Nature genetics 2010;42;5;430-5
Mouse welfare terms
Animal Technology and Welfare. 2010;9;175
A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1.
Wellcome Trust Centre for Human Genetics, Oxford, UK.
To identify new susceptibility loci for psoriasis, we undertook a genome-wide association study of 594,224 SNPs in 2,622 individuals with psoriasis and 5,667 controls. We identified associations at eight previously unreported genomic loci. Seven loci harbored genes with recognized immune functions (IL28RA, REL, IFIH1, ERAP1, TRAF3IP2, NFKBIA and TYK2). These associations were replicated in 9,079 European samples (six loci with a combined P < 5 × 10⁻⁸ and two loci with a combined P < 5 × 10⁻⁷). We also report compelling evidence for an interaction between the HLA-C and ERAP1 loci (combined P = 6.95 × 10⁻⁶). ERAP1 plays an important role in MHC class I peptide processing. ERAP1 variants only influenced psoriasis susceptibility in individuals carrying the HLA-C risk allele. Our findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis.
Funded by: Department of Health; Medical Research Council: G0000934, G0601387; Wellcome Trust: 068545/Z/02, 083948/Z/07/Z, 084726
Nature genetics 2010;42;11;985-90
Transcription profiling in human platelets reveals LRRFIP1 as a novel protein regulating platelet function.
Department of Cardiovascular Science, University of Leicester, Clinical Sciences Wing, Glenfield Hospital, Leicester, UK. firstname.lastname@example.org
Within the healthy population, there is substantial, heritable, and interindividual variability in the platelet response. We explored whether a proportion of this variability could be accounted for by interindividual variation in gene expression. Through a correlative analysis of genome-wide platelet RNA expression data from 37 subjects representing the normal range of platelet responsiveness within a cohort of 500 subjects, we identified 63 genes in which transcript levels correlated with variation in the platelet response to adenosine diphosphate and/or the collagen-mimetic peptide, cross-linked collagen-related peptide. Many of these encode proteins with no reported function in platelets. An association study of 6 of the 63 genes in 4235 cases and 6379 controls showed a putative association with myocardial infarction for COMMD7 (COMM domain-containing protein 7) and a major deviation from the null hypo thesis for LRRFIP1 [leucine-rich repeat (in FLII) interacting protein 1]. Morpholino-based silencing in Danio rerio identified a modest role for commd7 and a significant effect for lrrfip1 as positive regulators of thrombus formation. Proteomic analysis of human platelet LRRFIP1-interacting proteins indicated that LRRFIP1 functions as a component of the platelet cytoskeleton, where it interacts with the actin-remodeling proteins Flightless-1 and Drebrin. Taken together, these data reveal novel proteins regulating the platelet response.
Funded by: British Heart Foundation: RG/09/012/28096; Medical Research Council: MC_U105292688
Computing behaviour in complex synapses
Rare copy number variants: a point of rarity in genetic risk for bipolar disorder and schizophrenia.
Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff CF14 4XN, Wales, UK.
Context: Recent studies suggest that copy number variation in the human genome is extensive and may play an important role in susceptibility to disease, including neuropsychiatric disorders such as schizophrenia and autism. The possible involvement of copy number variants (CNVs) in bipolar disorder has received little attention to date.
Objectives: To determine whether large (>100,000 base pairs) and rare (found in <1% of the population) CNVs are associated with susceptibility to bipolar disorder and to compare with findings in schizophrenia.
Design: A genome-wide survey of large, rare CNVs in a case-control sample using a high-density microarray.
Setting: The Wellcome Trust Case Control Consortium.
Participants: There were 1697 cases of bipolar disorder and 2806 nonpsychiatric controls. All participants were white UK residents.
Main outcome measures: Overall load of CNVs and presence of rare CNVs.
Results: The burden of CNVs in bipolar disorder was not increased compared with controls and was significantly less than in schizophrenia cases. The CNVs previously implicated in the etiology of schizophrenia were not more common in cases with bipolar disorder.
Conclusions: Schizophrenia and bipolar disorder differ with respect to CNV burden in general and association with specific CNVs in particular. Our data are consistent with the possibility that possession of large, rare deletions may modify the phenotype in those at risk of psychosis: those possessing such events are more likely to be diagnosed as having schizophrenia, and those without them are more likely to be diagnosed as having bipolar disorder.
Funded by: Chief Scientist Office: CZB/4/540, ETM/75; Medical Research Council: G0600329, G0701003, G0701420, G0800509, G0800759, G90/106; Wellcome Trust: 061858, 079643, 090532
Archives of general psychiatry 2010;67;4;318-27
Being more realistic about the public health impact of genomic medicine.
University of Queensland Centre for Clinical Research, The University of Queensland, Herston, Queensland, Australia. email@example.com
PLoS medicine 2010;7;10
A pharmacometric model describing the relationship between warfarin dose and INR response with respect to variations in CYP2C9, VKORC1, and age.
Department of Medical Sciences, Clinical Pharmacology, Uppsala University Hospital, Uppsala, Sweden. firstname.lastname@example.org
The objective of the study was to update a previous NONMEM model to describe the relationship between warfarin dose and international normalized ratio (INR) response, to decrease the dependence of the model on pharmacokinetic (PK) data, and to improve the characterization of rare genotype combinations. The effects of age and CYP2C9 genotype on S-warfarin clearance were estimated from high-quality PK data. Thereafter, a temporal dose-response (K-PD) model was developed from information on dose, INR, age, and CYP2C9 and VKORC1 genotype, with drug clearance as a covariate. Two transit compartment chains accounted for the delay between exposure and response. CYP2C9 genotype was identified as the single most important predictor of required dose, causing a difference of up to 4.2-fold in the maintenance dose. VKORC1 accounted for a difference of up to 2.1-fold in dose, and age reduced the dose requirement by ~6% per decade. This reformulated K-PD model decreases dependence on PK data and enables robust assessment of INR response and dose predictions, even in individuals with rare genotype combinations.
Clinical pharmacology and therapeutics 2010;87;6;727-34
KSHV-encoded miRNAs target MAF to induce endothelial cell reprogramming.
Cancer Research UK Viral Oncology Group, University College London Cancer Institute, University College London, London WC1E 6BT, United Kingdom.
Kaposi sarcoma herpesvirus (KSHV) induces transcriptional reprogramming of endothelial cells. In particular, KSHV-infected lymphatic endothelial cells (LECs) show an up-regulation of genes associated with blood vessel endothelial cells (BECs). Consequently, KSHV-infected tumor cells in Kaposi sarcoma are poorly differentiated endothelial cells, expressing markers of both LECs and BECs. MicroRNAs (miRNAs) are short noncoding RNA molecules that act post-transcriptionally to negatively regulate gene expression. Here we validate expression of the KSHV-encoded miRNAs in Kaposi sarcoma lesions and demonstrate that these miRNAs contribute to viral-induced reprogramming by silencing the cellular transcription factor MAF (musculoaponeurotic fibrosarcoma oncogene homolog). MAF is expressed in LECs but not in BECs. We identify a novel role for MAF as a transcriptional repressor, preventing expression of BEC-specific genes, thereby maintaining the differentiation status of LECs. These findings demonstrate that viral miRNAs could influence the differentiation status of infected cells, and thereby contribute to KSHV-induced oncogenesis.
Funded by: Cancer Research UK; Medical Research Council: G0800168
Genes & development 2010;24;2;195-205
Evolution of MRSA during hospital transmission and intercontinental spread.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 15A, UK.
Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.
Funded by: Department of Health; Wellcome Trust: 076964
Science (New York, N.Y.) 2010;327;5964;469-74
Evolutionary dynamics of Clostridium difficile over short and long time scales.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.
Clostridium difficile has rapidly emerged as the leading cause of antibiotic-associated diarrheal disease, with the transcontinental spread of various PCR ribotypes, including 001, 017, 027 and 078. However, the genetic basis for the emergence of C. difficile as a human pathogen is unclear. Whole genome sequencing was used to analyze genetic variation and virulence of a diverse collection of thirty C. difficile isolates, to determine both macro and microevolution of the species. Horizontal gene transfer and large-scale recombination of core genes has shaped the C. difficile genome over both short and long time scales. Phylogenetic analysis demonstrates C. difficile is a genetically diverse species, which has evolved within the last 1.1-85 million years. By contrast, the disease-causing isolates have arisen from multiple lineages, suggesting that virulence evolved independently in the highly epidemic lineages.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2010;107;16;7527-32
Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution.
Regensburg University Medical Center, Department of Epidemiology and Preventive Medicine, Regensburg, Germany.
Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10⁻⁹ to P = 1.8 × 10⁻⁴⁰) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10⁻³ to P = 1.2 × 10⁻¹³). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.
Funded by: British Heart Foundation; Chief Scientist Office: CZB/4/710; Department of Health; Intramural NIH HHS: Z01 HG000024-14; Medical Research Council: G0000934, G0401527, G0500115, G0501184, G0600331, G0600705, G0601261, G0701863, G0801056, G9521010, MC_QA137934, MC_U106179472, MC_U106188470, MC_U127561128, MC_UP_A390_1107; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262, R01 CA067262-14, U01 CA049449, U01 CA049449-21, U01 CA098233, U01 CA098233-08, U01-CA098233; NCRR NIH HHS: UL1 RR025005, UL1 RR025005-04, UL1-RR025005, UL1-RR025005; NHGRI NIH HHS: HG002651, HG005581, N01 HG065403, N01-HG-65403, R01 HG002651, R01 HG002651-05, RC2 HG005581, RC2 HG005581-02, T32 HG000040, T32 HG000040-14, U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, T32-HG00040, U01-HG004399, U01-HG004402; NHLBI NIH HHS: HL043851, HL084729, HL71981, K99 HL094535, K99 HL094535-02, N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085080, N01 HC085081, N01 HC085082, N01 HC085083, N01 HC085084, N01 HC085085, N01 HC085086, N01-HC-55018, N01-HC55222, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981, R01 HL071981-07, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087647, R01 HL087647-03, R01 HL087652, R01 HL087652-03, R01 HL087679-03, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-04, R01 HL117078, R01-HL087647, R01-HL59367, U01 HL054527, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084729, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL72515, K99HL094535, N01-HC-25195, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC15103, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, R01-HL086694, R01-HL087641, R01-HL087679, R01-HL087700, R01-HL088119, R01HL087652, U01-HL084756; NIA NIH HHS: N01 AG012100, N01 AG012109, N01-AG-1-2109, R01 AG031890, R01 AG031890-02, N01-AG-12100, R01-AG031890; NIDDK NIH HHS: DK062370, DK072193, DK075787, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30 DK046200, P30 DK046200-14, P30 DK072488, P30 DK072488-06, P60 DK020541, R01 DK056690, R01 DK058845, R01 DK058845-11, R01 DK068336, R01 DK068336-03, R01 DK072193, R01 DK072193-05, R01 DK073490, R01 DK073490-05, R01 DK075681, R01 DK075681-04, R01 DK075787, R01 DK075787-05, R01 DK089256, R01-DK068336, R01-DK075787, U01 DK062370, U01 DK062370-08, U01 DK062418, U01 DK062418-06, K23-DK080145, P30-DK072488, R01-DK-073490, R01-DK075681, R01-DK075787, U01-DK062418; NIGMS NIH HHS: U01 GM074518, U01 GM074518-05, U01-GM074518; NIMH NIH HHS: R01 MH063706-05, R01 MH084698, R01 MH084698-03, RL1 MH083268, RL1 MH083268-05, 1RL1-MH083268-01, MH084698, R01-MH63706; PHS HHS: 263-MA-410953; Wellcome Trust: 064890, 068545, 072960, 075491, 076113, 077011, 077016, 077016/Z/05/Z, 079557, 079895, 081682, 083270, 085235, 085301, 086596, 088885, 089061, 090532, 091746, 068545/Z/02, 072960, 076113/B/04/Z, 091746/Z/10/Z, WT086596/Z/08/Z
Nature genetics 2010;42;11;949-60
A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk.
Max-Delbrück-Center for Molecular Medicine (MDC), Berlin, Germany.
Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases that may not be apparent from genome-wide association studies alone. Recent advances in rat genomics are facilitating systems-genetics approaches. Here we report the use of integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)-driven inflammatory network (IDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and which was regulated in multiple tissues by a locus on rat chromosome 15q25. We show that Epstein-Barr virus induced gene 2 (Ebi2, also known as Gpr183), which lies at this locus and controls B lymphocyte migration, is expressed in macrophages and regulates the IDIN. The human orthologous locus on chromosome 13q32 controlled the human equivalent of the IDIN, which was conserved in monocytes. IDIN genes were more likely to associate with susceptibility to type 1 diabetes (T1D)-a macrophage-associated autoimmune disease-than randomly selected immune response genes (P = 8.85 × 10(-6)). The human locus controlling the IDIN was associated with the risk of T1D at single nucleotide polymorphism rs9585056 (P = 7.0 × 10(-10); odds ratio, 1.15), which was one of five single nucleotide polymorphisms in this region associated with EBI2 (GPR183) expression. These data implicate IRF7 network genes and their regulatory locus in the pathogenesis of T1D.
Funded by: British Heart Foundation: P301/10/0290; Medical Research Council: MC_U120061454, MC_U120085815, MC_U120097112; Wellcome Trust: 061858, 076113, 089989
Genome sequence of a recently emerged, highly transmissible, multi-antibiotic- and antiseptic-resistant variant of methicillin-resistant Staphylococcus aureus, sequence type 239 (TW).
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, United Kingdom. email@example.com
The 3.1-Mb genome of an outbreak methicillin-resistant Staphylococcus aureus (MRSA) strain (TW20) contains evidence of recently acquired DNA, including two large regions (635 kb and 127 kb). The strain is resistant to a wide range of antibiotics, antiseptics, and heavy metals due to resistance genes encoded on mobile genetic elements and also mutations in housekeeping genes.
Funded by: Wellcome Trust
Journal of bacteriology 2010;192;3;888-92
Emx2 and early hair cell development in the mouse inner ear.
Department of Biomedical Science, Addison Building, Western Bank, Sheffield S10 2TN, UK. firstname.lastname@example.org
Emx2 is a homeodomain protein that plays a critical role in inner ear development. Homozygous null mice die at birth with a range of defects in the CNS, renal system and skeleton. The cochlea is shorter than normal with about 60% fewer auditory hair cells. It appears to lack outer hair cells and some supporting cells are either absent or fail to differentiate. Many of the hair cells differentiate in pairs and although their hair bundles develop normally their planar cell polarity is compromised. Measurements of cell polarity suggest that classic planar cell polarity molecules are not directly influenced by Emx2 and that polarity is compromised by developmental defects in the sensory precursor population or by defects in epithelial cues for cell alignment. Planar cell polarity is normal in the vestibular epithelia although polarity reversal across the striola is absent in both the utricular and saccular maculae. In contrast, cochlear hair cell polarity is disorganized. The expression domain for Bmp4 is expanded and Fgfr1 and Prox1 are expressed in fewer cells in the cochlear sensory epithelium of Emx2 null mice. We conclude that Emx2 regulates early developmental events that balance cell proliferation and differentiation in the sensory precursor population.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust
Developmental biology 2010;340;2;547-56
Disease-associated XMRV sequences are consistent with laboratory contamination.
MRC Centre for Medical Molecular Virology, Division of Infection and Immunity, University College London, 46 Cleveland St, London W1T 4JF, UK.
Background: Xenotropic murine leukaemia viruses (MLV-X) are endogenous gammaretroviruses that infect cells from many species, including humans. Xenotropic murine leukaemia virus-related virus (XMRV) is a retrovirus that has been the subject of intense debate since its detection in samples from humans with prostate cancer (PC) and chronic fatigue syndrome (CFS). Controversy has arisen from the failure of some studies to detect XMRV in PC or CFS patients and from inconsistent detection of XMRV in healthy controls.
Results: Here we demonstrate that Taqman PCR primers previously described as XMRV-specific can amplify common murine endogenous viral sequences from mouse suggesting that mouse DNA can contaminate patient samples and confound specific XMRV detection. To consider the provenance of XMRV we sequenced XMRV from the cell line 22Rv1, which is infected with an MLV-X that is indistinguishable from patient derived XMRV. Bayesian phylogenies clearly show that XMRV sequences reportedly derived from unlinked patients form a monophyletic clade with interspersed 22Rv1 clones (posterior probability >0.99). The cell line-derived sequences are ancestral to the patient-derived sequences (posterior probability >0.99). Furthermore, pol sequences apparently amplified from PC patient material (VP29 and VP184) are recombinants of XMRV and Moloney MLV (MoMLV) a virus with an envelope that lacks tropism for human cells. Considering the diversity of XMRV we show that the mean pairwise genetic distance among env and pol 22Rv1-derived sequences exceeds that of patient-associated sequences (Wilcoxon rank sum test: p = 0.005 and p < 0.001 for pol and env, respectively). Thus XMRV sequences acquire diversity in a cell line but not in patient samples. These observations are difficult to reconcile with the hypothesis that published XMRV sequences are related by a process of infectious transmission.
Conclusions: We provide several independent lines of evidence that XMRV detected by sensitive PCR methods in patient samples is the likely result of PCR contamination with mouse DNA and that the described clones of XMRV arose from the tumour cell line 22Rv1, which was probably infected with XMRV during xenografting in mice. We propose that XMRV might not be a genuine human pathogen.
Funded by: Medical Research Council: G0801172, G0801172(87743), G9721629; Wellcome Trust: 090940, WT076608, WT090940
Interleukin-8 mediates resistance to antiangiogenic agent sunitinib in renal cell carcinoma.
Laboratory of Cancer Genetics, Laboratory of Computational Biology, Van Andel Research Institute, Grand Rapids, Michigan 49503, USA.
The broad spectrum kinase inhibitor sunitinib is a first-line therapy for advanced clear cell renal cell carcinoma (ccRCC), a deadly form of kidney cancer. Unfortunately, most patients develop sunitinib resistance and progressive disease after about 1 year of treatment. In this study, we evaluated the mechanisms of resistance to sunitinib to identify the potential tactics to overcome it. Xenograft models were generated that mimicked clinical resistance to sunitinib. Higher microvessel density was found in sunitinib-resistant tumors, indicating that an escape from antiangiogenesis occurred. Notably, escape coincided with increased secretion of interleukin-8 (IL-8) from tumors into the plasma, and coadministration of an IL-8 neutralizing antibody resensitized tumors to sunitinib treatment. In patients who were refractory to sunitinib treatment, IL-8 expression was elevated in ccRCC tumors, supporting the concept that IL-8 levels might predict clinical response to sunitinib. Our results reveal IL-8 as an important contributor to sunitinib resistance in ccRCC and a candidate therapeutic target to reverse acquired or intrinsic resistance to sunitinib in this malignancy.
Funded by: Wellcome Trust: 077012/Z/05/Z
Cancer research 2010;70;3;1063-71
Characterising and predicting haploinsufficiency in the human genome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Funded by: NIGMS NIH HHS: R01 GM067779; Wellcome Trust: 077014/Z/05/Z
PLoS genetics 2010;6;10;e1001154
Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites.
Institute for Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, UK. Paul.Hunt@ed.ac.uk
Background: Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum.
Results: A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme.
Conclusions: This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D019621/1; Medical Research Council: G0400476, G0900740; Wellcome Trust: 082611/Z/07/Z
BMC genomics 2010;11;499
Systematic analysis of human protein complexes identifies chromosome segregation proteins.
Research Institute of Molecular Pathology (IMP), Dr. Bohr-Gasse 7, A-1030 Vienna, Austria.
Chromosome segregation and cell division are essential, highly ordered processes that depend on numerous protein complexes. Results from recent RNA interference screens indicate that the identity and composition of these protein complexes is incompletely understood. Using gene tagging on bacterial artificial chromosomes, protein localization, and tandem-affinity purification-mass spectrometry, the MitoCheck consortium has analyzed about 100 human protein complexes, many of which had not or had only incompletely been characterized. This work has led to the discovery of previously unknown, evolutionarily conserved subunits of the anaphase-promoting complex and the gamma-tubulin ring complex--large complexes that are essential for spindle assembly and chromosome segregation. The approaches we describe here are generally applicable to high-throughput follow-up analyses of phenotypic screens in mammalian cells.
Funded by: Austrian Science Fund FWF: F 3407
Science (New York, N.Y.) 2010;328;5978;593-9
Epilepsy and mental retardation limited to females with PCDH19 mutations can present de novo or in single generation families.
SA Pathology, Women's and Children's Hospital, 72 King William Road, North Adelaide, SA 5006, Australia.
Background: Epilepsy and mental retardation limited to females (EFMR) is an intriguing X-linked disorder affecting heterozygous females and sparing hemizygous males. Mutations in the protocadherin 19 (PCDH19) gene have been identified in seven unrelated families with EFMR.
Methods and results: Here, we assessed the frequency of PCDH19 mutations in individuals with clinical features which overlap those of EFMR. We analysed 185 females from three cohorts: 42 with Rett syndrome who were negative for MECP2 and CDKL5 mutations, 57 with autism spectrum disorders, and 86 with epilepsy with or without intellectual disability. No mutations were identified in the Rett syndrome and autism spectrum disorders cohorts suggesting that despite sharing similar clinical characteristics with EFMR, PCDH19 mutations are not generally associated with these disorders. Among the 86 females with epilepsy (of whom 51 had seizure onset before 3 years), with or without intellectual disability, we identified two (2.3%) missense changes. One (c.1671C-->G, p.N557K), reported previously without clinical data, was found in two affected sisters, the first EFMR family without a multigenerational family history of affected females. The second, reported here, is a novel de novo missense change identified in a sporadic female. The change, p.S276P, is predicted to result in functional disturbance of PCDH19 as it affects a highly conserved residue adjacent to the adhesion interface of EC3 of PCDH19.
Conclusions: This de novo PCDH19 mutation in a sporadic female highlights that mutational analysis should be considered in isolated instances of girls with infantile onset seizures and developmental delay, in addition to those with the characteristic family history of EFMR.
Funded by: Wellcome Trust
Journal of medical genetics 2010;47;3;211-6
Four novel Loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo.
Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands.
There is increasing evidence that the microcirculation plays an important role in the pathogenesis of cardiovascular diseases. Changes in retinal vascular caliber reflect early microvascular disease and predict incident cardiovascular events. We performed a genome-wide association study to identify genetic variants associated with retinal vascular caliber. We analyzed data from four population-based discovery cohorts with 15,358 unrelated Caucasian individuals, who are members of the Cohort for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, and replicated findings in four independent Caucasian cohorts (n = 6,652). All participants had retinal photography and retinal arteriolar and venular caliber measured from computer software. In the discovery cohorts, 179 single nucleotide polymorphisms (SNP) spread across five loci were significantly associated (p<5.0×10(-8)) with retinal venular caliber, but none showed association with arteriolar caliber. Collectively, these five loci explain 1.0%-3.2% of the variation in retinal venular caliber. Four out of these five loci were confirmed in independent replication samples. In the combined analyses, the top SNPs at each locus were: rs2287921 (19q13; p = 1.61×10(-25), within the RASIP1 locus), rs225717 (6q24; p = 1.25×10(-16), adjacent to the VTA1 and NMBR loci), rs10774625 (12q24; p = 2.15×10(-13), in the region of ATXN2,SH2B3 and PTPN11 loci), and rs17421627 (5q14; p = 7.32×10(-16), adjacent to the MEF2C locus). In two independent samples, locus 12q24 was also associated with coronary heart disease and hypertension. Our population-based genome-wide association study demonstrates four novel loci associated with retinal venular caliber, an endophenotype of the microcirculation associated with clinical cardiovascular disease. These data provide further insights into the contribution and biological mechanisms of microcirculatory changes that underlie cardiovascular disease.
Funded by: Medical Research Council: G0401527, G0701863, G0801056, MC_U105630924, MC_UP_A100_1003; NCRR NIH HHS: M01RR00069, UL1RR025005; NEI NIH HHS: R01 EY018246, Z01 EY000401-06, Z01 EY000401-07, Z01 EY000426-04, Z01 EY000426-05, Z01EY000401, Z01EY000426, Z99 EY999999, ZIA EY000401-08, ZIA EY000401-09, ZIA EY000401-10, ZIA EY000403-09, ZIA EY000403-10, ZIA EY000426-06, ZIA EY000426-07, ZIA EY000426-08; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01HC-55222, R01 HL087652, R01HL087641, T32HL007902, U01 HL080295; NIA NIH HHS: N01-AG-12100, Z01AG007380; NIDDK NIH HHS: DK063491
PLoS genetics 2010;6;10;e1001184
A genome-wide perspective of genetic variation in human metabolism.
Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
Serum metabolite concentrations provide a direct readout of biological processes in the human body, and they are associated with disorders such as cardiovascular and metabolic diseases. We present a genome-wide association study (GWAS) of 163 metabolic traits measured in human blood from 1,809 participants from the KORA population, with replication in 422 participants of the TwinsUK cohort. For eight out of nine replicated loci (FADS1, ELOVL2, ACADS, ACADM, ACADL, SPTLC3, ETFDH and SLC16A9), the genetic variant is located in or near genes encoding enzymes or solute carriers whose functions match the associating metabolic traits. In our study, the use of metabolite concentration ratios as proxies for enzymatic reaction rates reduced the variance and yielded robust statistical associations with P values ranging from 3 x 10(-24) to 6.5 x 10(-179). These loci explained 5.6%-36.3% of the observed variance in metabolite concentrations. For several loci, associations with clinically relevant parameters have been reported previously.
Funded by: Biotechnology and Biological Sciences Research Council: G20234; Wellcome Trust: 091746
Nature genetics 2010;42;2;137-41
Orphan CpG islands identify numerous conserved promoters in the mammalian genome.
Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom.
CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are "orphans" that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development.
Funded by: Medical Research Council: G0800026, G0900627; Wellcome Trust: 077224
PLoS genetics 2010;6;9;e1001134
Hypokalemic periodic paralysis associated with thyrotoxicosis, renal tubular acidosis and nephrogenic diabetes insipidus.
Division of Endocrinology and Metabolism, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, Korea.
A 19-year-old girl presented at our emergency room with hypokalemic periodic paralysis. She had a thyrotoxic goiter and had experienced three paralytic attacks during the previous 2 years on occasions when she stopped taking antithyroid drugs. In addition to thyrotoxic periodic paralysis (TPP), she had metabolic acidosis, urinary potassium loss, polyuria and polydipsia. Her reduced ability to acidify urine during spontaneous metabolic acidosis was confirmed by detection of coexisting distal renal tubular acidosis (RTA). The polyuria and polydipsia were caused by nephrogenic diabetes insipidus, which was diagnosed using the water deprivation test and vasopressin administration. Her recurrent and frequent paralytic attacks may have been the combined effects of thyrotoxicosis and RTA. Although the paralytic attack did not recur after improving the thyroid function, mild acidosis and nephrogenic DI have been remained subsequently. Patients with TPP, especially females with atypical metabolic features, should be investigated for possible precipitating factors.
Endocrine journal 2010;57;4;347-50
Detailed physiologic characterization reveals diverse mechanisms for novel genetic Loci regulating glucose and insulin metabolism in humans.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. email@example.com
OBJECTIVE Recent genome-wide association studies have revealed loci associated with glucose and insulin-related traits. We aimed to characterize 19 such loci using detailed measures of insulin processing, secretion, and sensitivity to help elucidate their role in regulation of glucose control, insulin secretion and/or action. RESEARCH DESIGN AND METHODS We investigated associations of loci identified by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) with circulating proinsulin, measures of insulin secretion and sensitivity from oral glucose tolerance tests (OGTTs), euglycemic clamps, insulin suppression tests, or frequently sampled intravenous glucose tolerance tests in nondiabetic humans (n = 29,084). RESULTS The glucose-raising allele in MADD was associated with abnormal insulin processing (a dramatic effect on higher proinsulin levels, but no association with insulinogenic index) at extremely persuasive levels of statistical significance (P = 2.1 x 10(-71)). Defects in insulin processing and insulin secretion were seen in glucose-raising allele carriers at TCF7L2, SCL30A8, GIPR, and C2CD4B. Abnormalities in early insulin secretion were suggested in glucose-raising allele carriers at MTNR1B, GCK, FADS1, DGKB, and PROX1 (lower insulinogenic index; no association with proinsulin or insulin sensitivity). Two loci previously associated with fasting insulin (GCKR and IGF1) were associated with OGTT-derived insulin sensitivity indices in a consistent direction. CONCLUSIONS Genetic loci identified through their effect on hyperglycemia and/or hyperinsulinemia demonstrate considerable heterogeneity in associations with measures of insulin processing, secretion, and sensitivity. Our findings emphasize the importance of detailed physiological characterization of such loci for improved understanding of pathways associated with alterations in glucose homeostasis and eventually type 2 diabetes.
Funded by: Medical Research Council: G0701863, MC_U106179471, MC_U147574213, MC_U147574239, MC_UP_A620_1014, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087647; NIDDK NIH HHS: R01 DK029867
An immune response network associated with blood lipid levels.
Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. firstname.lastname@example.org
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Funded by: Wellcome Trust: WT089061, WT089062
PLoS genetics 2010;6;9;e1001113
International network of cancer genome projects.
The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
Funded by: Cancer Research UK: 6613; NCI NIH HHS: P01 CA117969, P01 CA117969-04S1, P01 CA117969-05, P50 CA102701, P50 CA102701-08, P50 CA127003, P50 CA127003-04, P50 CA127003-05; NHGRI NIH HHS: R01 HG001806-02; NIDDK NIH HHS: K08 DK071329, K08 DK071329-04, K08 DK071329-05; Wellcome Trust: 077198, 088340, 093867
Integrating common and rare genetic variation in diverse human populations.
Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02138, USA. email@example.com
Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.
Funded by: Medical Research Council: G0000934; NHGRI NIH HHS: U54 HG003273; NIDDK NIH HHS: P30 DK043351; Wellcome Trust: 068545, 068545/Z/02, 076113, 077011, 077014, 082371, 089061, 089062, 091746
Failure to validate association between 12p13 variants and ischemic stroke.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F019394/1; Intramural NIH HHS; Medical Research Council: G0000934, G0700704, G0701075, G0800509; NCI NIH HHS: CA 047988; NCRR NIH HHS: M01 RR 165001, M01 RR07122, R54 RR020278; NHGRI NIH HHS: U01 HG004436; NHLBI NIH HHS: HL 043851, HL69757, R01 HL087676, R25 HL088724; NIDDK NIH HHS: P30 DK072488; NINDS NIH HHS: 1R01 NS059727, K08 NS045802, NS056302, NS30678, NS34447, NS36695, R01 NS 42733, R01 NS059727, R01 NS059727-01A1, R01 NS45012, R21NS064908; PHS HHS: P60 12583; Wellcome Trust: 068545/Z/02
The New England journal of medicine 2010;362;16;1547-50
The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human african trypanosomiasis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
Background: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans.
Methods and findings: An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei.
Conclusions: This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.
Funded by: Wellcome Trust: 079703, 095201, WT085775/Z/08/Z
PLoS neglected tropical diseases 2010;4;4;e658
Reverse engineering a gene network using an asynchronous parallel evolution strategy.
Laboratory for Development & Evolution, University Museum of Zoology, Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK.
Background: The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task.
Results: Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested.
Conclusions: Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D000513/1, BB/D00513
BMC systems biology 2010;4;17
Typhoid in Kenya is associated with a dominant multidrug-resistant Salmonella enterica serovar Typhi haplotype that is also widespread in Southeast Asia.
Centre for Microbiology Research, Kenya Medical Research Institute, P.O. Box 43640-00100, Nairobi, Kenya. firstname.lastname@example.org
In sub-Saharan Africa, the burden of typhoid fever, caused by Salmonella enterica serovar Typhi, remains largely unknown, in part because of a lack of blood or bone marrow culture facilities. We characterized a total of 323 S. Typhi isolates from outbreaks in Kenya over the period 1988 to 2008 for antimicrobial susceptibilities and phylogenetic relationships using single-nucleotide polymorphism (SNP) analysis. There was a dramatic increase in the number and percentage of multidrug-resistant (MDR) S. Typhi isolates over the study period. Overall, only 54 (16.7%) S. Typhi isolates were fully sensitive, while the majority, 195 (60.4%), were multiply resistant to most commonly available drugs-ampicillin, chloramphenicol, tetracycline, and cotrimoxazole; 74 (22.9%) isolates were resistant to a single antimicrobial, usually ampicillin, cotrimoxazole, or tetracycline. Resistance to these antibiotics was encoded on self-transferrable IncHI1 plasmids of the ST6 sequence type. Of the 94 representative S. Typhi isolates selected for genome-wide haplotype analysis, sensitive isolates fell into several phylogenetically different groups, whereas MDR isolates all belonged to a single haplotype, H58, associated with MDR and decreased ciprofloxacin susceptibility, which is also dominant in many parts of Southeast Asia. Derivatives of the same S. Typhi lineage, H58, are responsible for multidrug resistance in Kenya and parts of Southeast Asia, suggesting intercontinental spread of a single MDR clone. Given the emergence of this aggressive MDR haplotype, careful selection and monitoring of antibiotic usage will be required in Kenya, and potentially other regions of sub-Saharan Africa.
Funded by: Wellcome Trust: 064616/01/Z.
Journal of clinical microbiology 2010;48;6;2171-6
The burden and characteristics of enteric fever at a healthcare facility in a densely populated area of Kathmandu.
Oxford University Clinical Research Unit, Patan Academy of Health Sciences, Lagankhel, Kathmandu, Nepal.
Enteric fever, caused by Salmonella enterica serovars Typhi and Paratyphi A (S. Typhi and S. Paratyphi A) remains a major public health problem in many settings. The disease is limited to locations with poor sanitation which facilitates the transmission of the infecting organisms. Efficacious and inexpensive vaccines are available for S. Typhi, yet are not commonly deployed to control the disease. Lack of vaccination is due partly to uncertainty of the disease burden arising from a paucity of epidemiological information in key locations. We have collected and analyzed data from 3,898 cases of blood culture-confirmed enteric fever from Patan Hospital in Lalitpur Sub-Metropolitan City (LSMC), between June 2005 and May 2009. Demographic data was available for a subset of these patients (n = 527) that were resident in LSMC and who were enrolled in trials. We show a considerable burden of enteric fever caused by S. Typhi (2,672; 68.5%) and S. Paratyphi A (1,226; 31.5%) at this Hospital over a four year period, which correlate with seasonal fluctuations in rainfall. We found that local population density was not related to incidence and we identified a focus of infections in the east of LSMC. With data from patients resident in LSMC we found that the median age of those with S. Typhi (16 years) was significantly less than S. Paratyphi A (20 years) and that males aged 15 to 25 were disproportionately infected. Our findings provide a snapshot into the epidemiological patterns of enteric fever in Kathmandu. The uneven distribution of enteric fever patients within the population suggests local variation in risk factors, such as contaminated drinking water. These findings are important for initiating a vaccination scheme and improvements in sanitation. We suggest any such intervention should be implemented throughout the LSMC area.
Funded by: Medical Research Council: G0600718; Wellcome Trust
PloS one 2010;5;11;e13988
Mass Spectrometry for Microbial Proteomics: Issues in Data Analysis with Electrophoretic or Mass Spectrometric Expression Proteomic Data
Mass Spectrometry for Microbial Proteomics 2010;Chapter 18;423-40
European lactase persistence genotype shows evidence of association with increase in body mass index.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. email@example.com
The global prevalence of obesity has increased significantly in recent decades, mainly due to excess calorie intake and increasingly sedentary lifestyle. Here, we test the association between obesity measured by body mass index (BMI) and one of the best-known genetic variants showing strong selective pressure: the functional variant in the cis-regulatory element of the lactase gene. We tested this variant since it is presumed to provide nutritional advantage in specific physical and cultural environments. We genetically defined lactase persistence (LP) in 31 720 individuals from eight European population-based studies and one family study by genotyping or imputing the European LP variant (rs4988235). We performed a meta-analysis by pooling the beta-coefficient estimates of the relationship between rs4988235 and BMI from the nine studies and found that the carriers of the allele responsible for LP among Europeans showed higher BMI (P = 7.9 x 10(-5)). Since this locus has been shown to be prone to population stratification, we paid special attention to reveal any population substructure which might be responsible for the association signal. The best evidence of exclusion of stratification came from the Dutch family sample which is robust for stratification. In this study, we highlight issues in model selection in the genome-wide association studies and problems in imputation of these special genomic regions.
Funded by: CCR NIH HHS: N01-RC-37004, N01-RC-45035; Medical Research Council: G0600705; NCI NIH HHS: N01-CN-45165; NHLBI NIH HHS: 1-R01-HL087679-01
Human molecular genetics 2010;19;6;1129-36
Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe.
Integrative Omics Research Centre, Korea Research Institute of Bioscience and Biotechnology, Yuseong, Daejeon, Korea.
We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome providing a tool for studying eukaryotic biology. Comprehensive gene dispensability comparisons with budding yeast--the only other eukaryote for which a comprehensive knockout library exists--revealed that 83% of single-copy orthologs in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than nonessential genes to be present in a single copy, to be broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth.
Funded by: Cancer Research UK; Wellcome Trust: 093917
Nature biotechnology 2010;28;6;617-23
Loss of NPC1 function in a patient with a co-inherited novel insulin receptor mutation does not grossly modify the severity of the associated insulin resistance.
Department of Endocrinology, Birmingham Children's Hospital, Steelhouse Lane, Birmingham B4 6NH, United Kingdom.
In Npc1 null mice, a model for Niemann Pick Disease Type C1, it has been reported that hepatocyte insulin receptor function is significantly impaired, consistent with growing evidence that membrane fluidity and microdomain structure have an important role in insulin signal transduction. However, whether insulin receptor function is also compromised in human Niemann Pick disease Type C1 is unclear. We now report a girl who developed progressive dementia, ataxia and opthalmoplegia from 9 years old, followed by severe acanthosis nigricans, hirsutism and acne at 11 years old. She was diagnosed with Niemann Pick Disease type C1 (OMIM#257220) based on positive filipin staining and reduced cholesterol-esterifying activity in dermal fibroblasts, and homozygosity for the p.Ile1061Thr NPC1 mutation. Further analysis revealed her also to be heterozygous for a novel trinucleotide deletion (c.3659 + 1_3659 + 3delGTG) at the end of exon 20 of INSR, encoding the insulin receptor, leading to deletion of Trp1193 in the intracellular tyrosine kinase domain. INSR mRNA and protein levels were normal in dermal fibroblasts, consistent with a primary signal transduction defect in the mutant receptor. Although the proband was significantly more insulin resistant than her father, who carried the INSR mutation but was only heterozygous for the NPC1 variant, their respective degrees of IR were very similar to those previously reported in a father-daughter pair with the closely related p.Trp1193Leu INSR mutation. This suggests that loss of NPC1 function, with attendant changes in membrane cholesterol composition, does not significantly modify the IR phenotype, even in the context of severely impaired INSR function.
Funded by: Medical Research Council; Wellcome Trust: 077016, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z
Journal of inherited metabolic disease 2010;33 Suppl 3;S227-32
Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach.
Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
Tumorigenesis is a multi-step process in which normal cells transform into malignant tumors following the accumulation of genetic mutations that enable them to evade the growth control checkpoints that would normally suppress their growth or result in apoptosis. It is therefore important to identify those combinations of mutations that collaborate in cancer development and progression. DNA copy number alterations (CNAs) are one of the ways in which cancer genes are deregulated in tumor cells. We hypothesized that synergistic interactions between cancer genes might be identified by looking for regions of co-occurring gain and/or loss. To this end we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. The resulting regions of high co-occurrence can be investigated for between-region functional interactions. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively, showing that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. These networks are also highly enriched for functional relationships between genes. We further examine sub-networks of these networks, core networks, which contain many known cancer genes. The core network for co-occurring DNA losses we find seems to be independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low-intensity copy number alterations may be an important feature of cancer development or maintenance by affecting gene dosage of a large interconnected network of functionally related genes.
PLoS computational biology 2010;6;1;e1000631
AnnoTrack--a tracking system for genome annotation.
Vertebrate Genome Analysis, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101HH, UK. firstname.lastname@example.org
Background: As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.
Results: As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.
Conclusions: AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at http://annotrack.sanger.ac.uk.
Funded by: NHGRI NIH HHS: 5U54HG004555; Wellcome Trust: 077198, WT077198/Z/05/Z
BMC genomics 2010;11;538
Slingshot: a PiggyBac based transposon system for tamoxifen-inducible 'self-inactivating' insertional mutagenesis.
Experimental Cancer Genetics, Welcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
We have developed a self-inactivating PiggyBac transposon system for tamoxifen inducible insertional mutagenesis from a stably integrated chromosomal donor. This system, which we have named 'Slingshot', utilizes a transposon carrying elements for both gain- and loss-of-function screens in vitro. We show that the Slingshot transposon can be efficiently mobilized from a range of chromosomal loci with high inducibility and low background generating insertions that are randomly dispersed throughout the genome. Furthermore, we show that once the Slingshot transposon has been mobilized it is not remobilized producing stable clonal integrants in all daughter cells. To illustrate the efficacy of Slingshot as a screening tool we set out to identify mediators of resistance to puromycin and the chemotherapeutic drug vincristine by performing genetrap screens in mouse embryonic stem cells. From these genome-wide screens we identified multiple independent insertions in the multidrug resistance transporter genes Abcb1a/b and Abcg2 conferring resistance to drug treatment. Importantly, we also show that the Slingshot transposon system is functional in other mammalian cell lines such as human HEK293, OVCAR-3 and PE01 cells suggesting that it may be used in a range of cell culture systems. Slingshot represents a flexible and potent system for genome-wide transposon-mediated mutagenesis with many potential applications.
Funded by: Cancer Research UK; Wellcome Trust
Nucleic acids research 2010;38;18;e173
Insertional mutagenesis in mice deficient for p15Ink4b, p16Ink4a, p21Cip1, and p27Kip1 reveals cancer gene interactions and correlations with tumor phenotypes.
Division of Molecular Genetics, The Centre of Biomedical Genetics, Academic Medical Center and Cancer Genomics Centre, Netherlands Cancer Institute, 1066CX, Amsterdam, the Netherlands.
The cyclin dependent kinase (CDK) inhibitors p15, p16, p21, and p27 are frequently deleted, silenced, or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development, showing that these genes function as tumor suppressors. Here, we describe high-throughput murine leukemia virus insertional mutagenesis screens in mice that are deficient for one or two CDK inhibitors. We retrieved 9,117 retroviral insertions from 476 lymphomas to define hundreds of loci that are mutated more frequently than expected by chance. Many of these loci are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also found associations between these loci with gender, age of tumor onset, and lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with single nucleotide polymorphisms associated with chronic lymphocytic leukemia revealed a significant overlap between the datasets. Together, our findings highlight the importance of genetic context within large-scale mutation detection studies, and they show a novel use for insertional mutagenesis data in prioritizing disease-associated genes that emerge from genome-wide association studies.
Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356
Cancer research 2010;70;2;520-31
Microindel detection in short-read sequence data.
Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin. email@example.com
Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge.
Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.
Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics (Oxford, England) 2010;26;6;722-9
Hundreds of variants clustered in genomic loci and biological pathways affect human height.
Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Exeter EX1 2LU, UK.
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Funded by: British Heart Foundation: PG/02/128, PG/02/128/14470; Cancer Research UK; Chief Scientist Office: CZB/4/276, CZB/4/279, CZB/4/710; Intramural NIH HHS; Medical Research Council: G0000649, G0000934, G0500539, G0600331, G0600331(77796), G0601261, G0701863, G9521010, G9521010(63660), G9521010D, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, P01 CA087969, P01 CA087969-12, R01 CA047988, R01 CA047988-20, R01 CA050385, R01 CA050385-20, R01 CA065725, R01 CA065725-14, R01 CA067262, R01 CA067262-14, R01 CA104021, R01 CA104021-02, U01 CA049449, U01 CA049449-21, U01 CA098233, U01 CA098233-08, U01-CA098233; NCRR NIH HHS: M01-RR00425, U54-RR020278, UL1-RR025005; NHGRI NIH HHS: HG002651, HG005214, HG005581, R01 HG002651, R01 HG002651-05, RC2 HG005581, RC2 HG005581-02, T32-HG00040, U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02, U01 HG005214, U01 HG005214-02, U01-HG004399, U01-HG004402, Z01-HG000024; NHLBI NIH HHS: HL043851, HL084729, HL69757, HL71981, K99-HL094535, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC55018, N01-HC55019, N01-HC55020, N01-HC55021, N01-HC55022, N01-HC55222, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, N02-HL-6-4278, R01 HL043851, R01 HL043851-10, R01 HL059367, R01 HL059367-10, R01 HL071981, R01 HL071981-07, R01 HL086694, R01 HL086694-02, R01 HL087641, R01 HL087641-01, R01 HL087647, R01 HL087647-01, R01 HL087652, R01 HL087652-01, R01 HL087676, R01 HL087676-01, R01 HL087679-01, R01 HL087700, R01 HL087700-03, R01 HL088119, R01 HL088119-01, R01 HL117078, R01-HL086694, R01-HL087641, R01-HL087647, R01-HL087652, R01-HL087676, R01-HL087679, R01-HL087700, R01-HL088119, R01-HL59367, U01 HL069757, U01 HL069757-10, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04, U01 HL084729, U01 HL084729-03, U01 HL084756, U01 HL084756-03, U01-HL080295, U01-HL084756, U01-HL72515; NIA NIH HHS: N01-AG12100, N01-AG12109, R01 AG031890, R01 AG031890-02, R01-AG031890, Z01-AG00675, Z01-AG007380; NIAAA NIH HHS: AA014041, AA07535, AA10248, AA13320, AA13321, AA13326, K05 AA017688, R01 AA007535, R01 AA007535-08, R01 AA013320-04, R01 AA013321, R01 AA013321-05, R01 AA013326-05, R01 AA014041-05; NIAMS NIH HHS: K08 AR055688, K08 AR055688-03, K08 AR055688-04, K08-AR055688; NIDA NIH HHS: DA12854, R01 DA012854, R01 DA012854-09; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK079466, DK080145, DK46200, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30 DK072488, R01 DK058845, R01 DK058845-11, R01 DK068336, R01 DK068336-01, R01 DK072193, R01 DK072193-05, R01 DK073490, R01 DK073490-01, R01 DK075681, R01 DK075681-02, R01 DK075787, R01 DK075787-03, R01 DK089256, R01 DK091718, R01-DK068336, R01-DK073490, R01-DK075681, R01-DK075787, U01 DK062370, U01 DK062370-08, U01 DK062418; NIGMS NIH HHS: U01 GM074518, U01 GM074518-05, U01-GM074518; NIMH NIH HHS: MH084698, R01 MH059160, R01 MH059160-04, R01 MH059565, R01 MH059565-06, R01 MH059566, R01 MH059566-08, R01 MH059571, R01 MH059571-05, R01 MH059586, R01 MH059586-08, R01 MH059587-09, R01 MH059588-08, R01 MH060870-09, R01 MH060879, R01 MH060879-08, R01 MH061675, R01 MH061675-09, R01 MH067257-04, R01 MH081800, R01 MH081800-01, R01-MH059160, R01-MH59565, R01-MH59566, R01-MH59571, R01-MH59586, R01-MH59587, R01-MH59588, R01-MH60870, R01-MH60879, R01-MH61675, R01-MH63706, R01-MH67257, R01-MH79469, R01-MH81800, RL1 MH083268, RL1 MH083268-05, RL1-MH083268, U01 MH079469, U01 MH079469-03, U01 MH079470, U01 MH079470-03, U01-MH79469, U01-MH79470; PHS HHS: 263-MA-410953, HHSN268200625226C, N01-G65403; Wellcome Trust: 064890, 068545, 068545/Z/02, 072856, 072960, 075491, 076113, 076113/B/04/Z, 076113/C/04/Z, 077016, 077016/Z/05/Z, 079557, 079771, 079895, 081682, 081682/Z/06/Z, 083270, 084183/Z/07/Z, 085301, 085301/Z/08/Z, 086596, 086596/Z/08/Z, 088885, 090532, 091746, 091746/Z/10/Z
Use of purified Clostridium difficile spores to facilitate evaluation of health care disinfection regimens.
Microbial Pathogenesis Laboratory, Wellcome Trust, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. firstname.lastname@example.org
Clostridium difficile is a major cause of antibiotic-associated diarrheal disease in many parts of the world. In recent years, distinct genetic variants of C. difficile that cause severe disease and persist within health care settings have emerged. Highly resistant and infectious C. difficile spores are proposed to be the main vectors of environmental persistence and host transmission, so methods to accurately monitor spores and their inactivation are urgently needed. Here we describe simple quantitative methods, based on purified C. difficile spores and a murine transmission model, for evaluating health care disinfection regimens. We demonstrate that disinfectants that contain strong oxidizing active ingredients, such as hydrogen peroxide, are very effective in inactivating pure spores and blocking spore-mediated transmission. Complete inactivation of 10⁶ pure C. difficile spores on indicator strips, a six-log reduction, and a standard measure of stringent disinfection regimens require at least 5 min of exposure to hydrogen peroxide vapor (HPV; 400 ppm). In contrast, a 1-min treatment with HPV was required to disinfect an environment that was heavily contaminated with C. difficile spores (17 to 29 spores/cm²) and block host transmission. Thus, pure C. difficile spores facilitate practical methods for evaluating the efficacy of C. difficile spore disinfection regimens and bringing scientific acumen to C. difficile infection control.
Funded by: Medical Research Council: G0901743; Wellcome Trust
Applied and environmental microbiology 2010;76;20;6895-900
CCRaVAT and QuTie-enabling analysis of rare variants in large-scale case control and quantitative trait association studies.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Background: Genome-wide association studies have been successful in finding common variants influencing common traits. However, these associations only account for a fraction of trait heritability. There has been a shift in the field towards studying low frequency and rare variants, which are now widely recognised as putative complex trait determinants. Despite this increasing focus on examining the role of low frequency and rare variants in complex disease susceptibility, there is a lack of user-friendly analytical packages implementing powerful association tests for the analysis of rare variants.
Results: We have developed two software tools, CCRaVAT (Case-Control Rare Variant Analysis Tool) and QuTie (Quantitative Trait), which enable efficient large-scale analysis of low frequency and rare variants. Both programs implement a collapsing method examining the accumulation of low frequency and rare variants across a locus of interest that has more power than single variant analysis. CCRaVAT carries out case-control analyses whereas QuTie has been developed for continuous trait analysis.
Conclusions: CCRaVAT and QuTie are easy to use software tools that allow users to perform genome-wide association analysis on low frequency and rare variants for both binary and quantitative traits. The software is freely available and provides the genetics community with a resource to perform association analysis on rarer genetic variants.
Funded by: Wellcome Trust: 064890, 079557, 079557MA, 081682, WT088885/Z/09/Z
BMC bioinformatics 2010;11;527
Phylogenetic analysis of gene structure and alternative splicing in alpha-actinins.
Institute for Neuroscience and Muscle Research, The Children's Hospital at Westmead, Sydney, NSW, Australia.
The alpha-actinins are an important family of actin-binding proteins with the ability to cross-link actin filaments when in dimer form. Members of the alpha-actinin family share a domain topology composed of highly conserved actin-binding and EF-hand domains separated by a rod domain composed of spectrin-like repeats. Functional diversity within this family has arisen through exon duplication and the formation of alternate splice isoforms as well as gene duplications during the evolution of vertebrates. In addition to the known functional domains, alpha-actinins also contain a consensus PDZ-binding site. The completed genome sequence of over 32 invertebrate species has allowed the analysis of gene structure and exon-gene duplication over a diverse range of phyla. Our analysis shows that relative to early branching metazoans, there has been considerable intron loss especially in arthropods with few cases of intron gains. The C-terminal PDZ-binding site is conserved in nearly all invertebrates but is missing in some nematodes and platyhelminths. Alternative splicing in the actin-binding domain is conserved in chordates, arthropods, and some nematodes and platyhelminths. In contrast, alternative splicing of the EF-hand domain is only observed in chordates. Finally, given the prevalence of exon duplications seen in the actin-binding domain, this may act as a significant mechanism in the modification of actin-binding properties.
Molecular biology and evolution 2010;27;4;773-80
The genome of a pathogenic rhodococcus: cooptive virulence underpinned by key gene acquisitions.
Microbial Pathogenesis Unit, Centres for Infectious Diseases and Immunity, Infection, and Evolution, University of Edinburgh, Edinburgh, United Kingdom.
We report the genome of the facultative intracellular parasite Rhodococcus equi, the only animal pathogen within the biotechnologically important actinobacterial genus Rhodococcus. The 5.0-Mb R. equi 103S genome is significantly smaller than those of environmental rhodococci. This is due to genome expansion in nonpathogenic species, via a linear gain of paralogous genes and an accelerated genetic flux, rather than reductive evolution in R. equi. The 103S genome lacks the extensive catabolic and secondary metabolic complement of environmental rhodococci, and it displays unique adaptations for host colonization and competition in the short-chain fatty acid-rich intestine and manure of herbivores--two main R. equi reservoirs. Except for a few horizontally acquired (HGT) pathogenicity loci, including a cytoadhesive pilus determinant (rpl) and the virulence plasmid vap pathogenicity island (PAI) required for intramacrophage survival, most of the potential virulence-associated genes identified in R. equi are conserved in environmental rhodococci or have homologs in nonpathogenic Actinobacteria. This suggests a mechanism of virulence evolution based on the cooption of existing core actinobacterial traits, triggered by key host niche-adaptive HGT events. We tested this hypothesis by investigating R. equi virulence plasmid-chromosome crosstalk, by global transcription profiling and expression network analysis. Two chromosomal genes conserved in environmental rhodococci, encoding putative chorismate mutase and anthranilate synthase enzymes involved in aromatic amino acid biosynthesis, were strongly coregulated with vap PAI virulence genes and required for optimal proliferation in macrophages. The regulatory integration of chromosomal metabolic genes under the control of the HGT-acquired plasmid PAI is thus an important element in the cooptive virulence of R. equi.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F003722/1, BB/I001107/1
PLoS genetics 2010;6;9;e1001145
MicroRNAs in mouse development and disease.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
MicroRNAs, small non-coding RNAs which act as repressors of target genes, were discovered in 1993, and since then have been shown to play important roles in the development of numerous systems. Consistent with this role, they are also implicated in the pathogenesis of multiple diseases. Here we review the involvement of microRNAs in mouse development and disease, with particular reference to deafness as an example.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust
Seminars in cell & developmental biology 2010;21;7;774-80
Reprogramming of T cells to natural killer-like cells upon Bcl11b deletion.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
T cells develop in the thymus and are critical for adaptive immunity. Natural killer (NK) lymphocytes constitute an essential component of the innate immune system in tumor surveillance, reproduction, and defense against microbes and viruses. Here, we show that the transcription factor Bcl11b was expressed in all T cell compartments and was indispensable for T lineage development. When Bcl11b was deleted, T cells from all developmental stages acquired NK cell properties and concomitantly lost or decreased T cell-associated gene expression. These induced T-to-natural killer (ITNK) cells, which were morphologically and genetically similar to conventional NK cells, killed tumor cells in vitro, and effectively prevented tumor metastasis in vivo. Therefore, ITNKs may represent a new cell source for cell-based therapies.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0501150, G0800784, G116/187; Wellcome Trust: 076962, 077186
Science (New York, N.Y.) 2010;329;5987;85-9
Meta-analysis and imputation refines the association of 15q25 with smoking quantity.
Department of Statistics, University of Oxford, Oxford, UK.
Smoking is a leading global cause of disease and mortality. We established the Oxford-GlaxoSmithKline study (Ox-GSK) to perform a genome-wide meta-analysis of SNP association with smoking-related behavioral traits. Our final data set included 41,150 individuals drawn from 20 disease, population and control cohorts. Our analysis confirmed an effect on smoking quantity at a locus on 15q25 (P = 9.45 x 10(-19)) that includes CHRNA5, CHRNA3 and CHRNB4, three genes encoding neuronal nicotinic acetylcholine receptor subunits. We used data from the 1000 Genomes project to investigate the region using imputation, which allowed for analysis of virtually all common SNPs in the region and offered a fivefold increase in marker density over HapMap2 (ref. 2) as an imputation reference panel. Our fine-mapping approach identified a SNP showing the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.
Funded by: Chief Scientist Office: CZB/4/540, CZB/4/710, ETM/75; Intramural NIH HHS: Z99 AG999999, ZIA AG000196-03, ZIA AG000196-04; Medical Research Council: G0401527, G0600329, G0701863, G0800759, G9521010, MC_U106179471, MC_U106188470, MC_U127561128
Nature genetics 2010;42;5;436-40
Characterisations of odorant-binding proteins in the tsetse fly Glossina morsitans morsitans.
Department of Biological Chemistry, Harpenden, UK.
Odorant-binding proteins (OBPs) play an important role in insect olfaction by mediating interactions between odorants and odorant receptors. We report for the first time 20 OBP genes in the tsetse fly Glossina morsitans morsitans. qRT-PCR revealed that 8 of these genes were highly transcribed in the antennae. The transcription of these genes in the antennae was significantly lower in males than in females and there was a clear correlation between OBP gene transcription and feeding status. Starvation over 72 h post-blood meal (PBM) did not significantly affect the transcription. However, the transcription in the antennae of 10-week-old flies was much higher than in 3-day-old flies at 48 h PBM and decreased sharply after 72 h starvation, suggesting that the OBP gene expression is affected by the insect's nutritional status. Sequence comparisons with OBPs of other Dipterans identified several homologs to sex pheromone-binding proteins and OBPs of Drosophila melanogaster.
Funded by: Wellcome Trust: WT085775/Z/08/Z
Cellular and molecular life sciences : CMLS 2010;67;6;919-29
Origin of the human malaria parasite Plasmodium falciparum in gorillas.
Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.
Plasmodium falciparum is the most prevalent and lethal of the malaria parasites infecting humans, yet the origin and evolutionary history of this important pathogen remain controversial. Here we develop a single-genome amplification strategy to identify and characterize Plasmodium spp. DNA sequences in faecal samples from wild-living apes. Among nearly 3,000 specimens collected from field sites throughout central Africa, we found Plasmodium infection in chimpanzees (Pan troglodytes) and western gorillas (Gorilla gorilla), but not in eastern gorillas (Gorilla beringei) or bonobos (Pan paniscus). Ape plasmodial infections were highly prevalent, widely distributed and almost always made up of mixed parasite species. Analysis of more than 1,100 mitochondrial, apicoplast and nuclear gene sequences from chimpanzees and gorillas revealed that 99% grouped within one of six host-specific lineages representing distinct Plasmodium species within the subgenus Laverania. One of these from western gorillas comprised parasites that were nearly identical to P. falciparum. In phylogenetic analyses of full-length mitochondrial sequences, human P. falciparum formed a monophyletic lineage within the gorilla parasite radiation. These findings indicate that P. falciparum is of gorilla origin and not of chimpanzee, bonobo or ancient human origin.
Funded by: Howard Hughes Medical Institute; NIAID NIH HHS: P30 AI 7767, P30 AI027767, P30 AI027767-21A1, R01 AI058715, R01 AI058715-06A1, R01 AI058715-07, R01 AI50529, R03 AI074778, R03 AI074778-02, R37 AI050529, R37 AI050529-07, R37 AI050529-08, T32 AI007245, T32 AI007245-26, U19 AI 067854, U19 AI067854, U19 AI067854-06; NIGMS NIH HHS: T32 GM008111, T32 GM008111-13; PHS HHS: R01 I58715; Wellcome Trust
Single genome amplification and direct amplicon sequencing of Plasmodium spp. DNA from ape fecal specimens.
Department of Medicine, University of Alabama at Birmingham.
Conventional PCR followed by molecular cloning and sequencing of amplified products is commonly used to test clinical specimens for target sequences of interest, such as viral, bacterial or parasite nucleic acids. However, this approach has serious limitations when used to analyze mixtures of genetically divergent templates(1-9). This is because Taq polymerase is prone to switch templates during the amplification process, thereby generating recombinants that do not exist in vivo (4). When amplicons are cloned prior to sequence analysis, the resulting sequences may also contain a substantial number of Tag-induced substitutions(1-4). Finally, cloning of amplicons can lead to a non-proportional representation of sequences due to the re-sampling of only certain templates(1-4). These confounders can be avoided by using single genome amplification (SGA) followed by direct sequencing of SGA amplicons(1-5). While SGA is not required for many research applications, we have shown it to be essential for deciphering the diversification pathways of human and simian immunodeficiency viruses (HIV/SIV) in acute and chronic infection(4-7), the detection of simian foamy virus (SFVCPZ) super-infection in wild-living chimpanzees(8), and most recently, the molecular identification and characterization of Plasmodium spp. infections in wild-living apes(9). Here, we describe SGA-direct amplicon sequencing of Plasmodium spp. DNA from ape fecal samples.
Funded by: NIAID NIH HHS: R37 AI050529
Protocol exchange 2010;2010
Ten simple rules for editing Wikipedia.
PLoS computational biology 2010;6;9
Loss-of-function variants in the genomes of healthy humans.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK. email@example.com
Genetic variants predicted to seriously disrupt the function of human protein-coding genes-so-called loss-of-function (LOF) variants-have traditionally been viewed in the context of severe Mendelian disease. However, recent large-scale sequencing and genotyping projects have revealed a surprisingly large number of these variants in the genomes of apparently healthy individuals--at least 100 per genome, including more than 30 in a homozygous state--suggesting a previously unappreciated level of variation in functional gene content between humans. These variants are mostly found at low frequency, suggesting that they are enriched for mildly deleterious polymorphisms suppressed by negative natural selection, and thus represent an attractive set of candidate variants for complex disease susceptibility. However, they are also enriched for sequencing and annotation artefacts, so overall present serious challenges for clinical sequencing projects seeking to identify severe disease genes amidst the 'noise' of technical error and benign genetic polymorphism. Systematic, high-quality catalogues of LOF variants present in the genomes of healthy individuals, built from the output of large-scale sequencing studies such as the 1000 Genomes Project, will help to distinguish between benign and disease-causing LOF variants, and will provide valuable resources for clinical genomics.
Funded by: Wellcome Trust
Human molecular genetics 2010;19;R2;R125-30
Dysregulated humoral immunity to nontyphoidal Salmonella in HIV-infected African adults.
Medical Research Council Centre for Immune Regulation and Clinical Immunology Service, Institute of Biomedical Research, School of Immunity and Infection, University of Birmingham, Birmingham, UK. firstname.lastname@example.org
Nontyphoidal Salmonellae are a major cause of life-threatening bacteremia among HIV-infected individuals. Although cell-mediated immunity controls intracellular infection, antibodies protect against Salmonella bacteremia. We report that high-titer antibodies specific for Salmonella lipopolysaccharide (LPS) are associated with a lack of Salmonella-killing in HIV-infected African adults. Killing was restored by genetically shortening LPS from the target Salmonella or removing LPS-specific antibodies from serum. Complement-mediated killing of Salmonella by healthy serum is shown to be induced specifically by antibodies against outer membrane proteins. This killing is lost when excess antibody against Salmonella LPS is added. Thus, our study indicates that impaired immunity against nontyphoidal Salmonella bacteremia in HIV infection results from excess inhibitory antibodies against Salmonella LPS, whereas serum killing of Salmonella is induced by antibodies against outer membrane proteins.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust
Science (New York, N.Y.) 2010;328;5977;508-12
Meeting report: a workshop on Best Practices in Genome Annotation.
Informatics, J. Craig Venter Institute, Rockville, MD 20850 USA, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK and The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA 94305 USA.
Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the 'Best Practices in Genome Annotation: Inference from Evidence' workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198
Database : the journal of biological databases and curation 2010;2010;baq001
FRT-seq: amplification-free, strand-specific transcriptome sequencing.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
We report an alternative approach to transcriptome sequencing for the Illumina Genome Analyzer, in which the reverse transcription reaction takes place on the flowcell. No amplification is performed during the library preparation, so PCR biases and duplicates are avoided, and because the template is poly(A)(+) RNA rather than cDNA, the resulting sequences are necessarily strand-specific. The method is compatible with paired- or single-end sequencing.
Funded by: Wellcome Trust: 079643, WT079643
Nature methods 2010;7;2;130-2
Target-enrichment strategies for next-generation sequencing.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.
Funded by: NHGRI NIH HHS: 5R21HG004749, R21 HG004749; NHLBI NIH HHS: 5R01HL094976, R01 HL094976; Wellcome Trust: WT079643
Nature methods 2010;7;2;111-8
Construction of a large extracellular protein interaction network and its resolution by spatiotemporal expression profiling.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Cambridge CB101HH, United Kingdom.
Extracellular interactions involving both secreted and membrane-tethered receptor proteins are essential to initiate signaling pathways that orchestrate cellular behaviors within biological systems. Because of the biochemical properties of these proteins and their interactions, identifying novel extracellular interactions remains experimentally challenging. To address this, we have recently developed an assay, AVEXIS (avidity-based extracellular interaction screen) to detect low affinity extracellular interactions on a large scale and have begun to construct interaction networks between zebrafish receptors belonging to the immunoglobulin and leucine-rich repeat protein families to identify novel signaling pathways important for early development. Here, we expanded our zebrafish protein library to include other domain families and many more secreted proteins and performed our largest screen to date totaling 16,544 potential unique interactions. We report 111 interactions of which 96 are novel and include the first documented extracellular ligands for 15 proteins. By including 77 interactions from previous screens, we assembled an expanded network of 188 extracellular interactions between 92 proteins and used it to show that secreted proteins have twice as many interaction partners as membrane-tethered receptors and that the connectivity of the extracellular network behaves as a power law. To try to understand the functional role of these interactions, we determined new expression patterns for 164 genes within our clone library by using whole embryo in situ hybridization at five key stages of zebrafish embryonic development. These expression data were integrated with the binding network to reveal where each interaction was likely to function within the embryo and were used to resolve the static interaction network into dynamic tissue- and stage-specific subnetworks within the developing zebrafish embryo. All these data were organized into a freely accessible on-line database called ARNIE (AVEXIS Receptor Network with Integrated Expression; www.sanger.ac.uk/arnie) and provide a valuable resource of new extracellular signaling interactions for developmental biology.
Funded by: Medical Research Council; Wellcome Trust: 077108/Z/05/Z
Molecular & cellular proteomics : MCP 2010;9;12;2654-65
Novel candidate cancer genes identified by a large-scale cross-species comparative oncogenomics approach.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom.
Comparative genomic hybridization (CGH) can reveal important disease genes but the large regions identified could sometimes contain hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the murine leukemia virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes mutated in both human and mouse tumors, making them strong candidates for novel cancer genes. A significant number of these genes contained binding sites for the stem cell transcription factors Oct4 and Nanog. Notably, mice carrying tumors with insertions in or near stem cell module genes, which are thought to participate in cell self-renewal, died significantly faster than mice without these insertions. A comparison of the profile we identified to that induced with the Sleeping Beauty (SB) transposon system revealed significant differences in the profile of recurrently mutated genes. Collectively, this work provides a rich catalogue of new candidate cancer genes for functional analysis.
Funded by: Cancer Research UK: A6997, A8784; NCI NIH HHS: K01 CA122183, K01CA122183, R01 CA113636, R01 CA134759; Wellcome Trust: 077198, 082356
Cancer research 2010;70;3;883-95
Regulation of the Epstein-Barr virus Zp promoter in B lymphocytes during reactivation from latency.
Department of Virology, Imperial College Faculty of Medicine, St Mary's Campus, London W2 1PG, UK.
Ten novel mutations were introduced into the Zp promoter to test the role of sequences outside the established transcription factor-binding sites in Epstein-Barr virus (EBV) reactivation. Most of these had only small effects, but mutations in the ZID site were shown to reduce Zp activity strongly at early times after induction by anti-immunoglobulin (anti-Ig). The binding of MEF2 transcription factor to ZID was characterized in detail and linked functionally to Zp promoter activity. The presence of XBP-1s, the active form of XBP-1, after administration of anti-Ig to Akata Burkitt's lymphoma cells is consistent with a role for this factor in reactivation of the EBV lytic cycle, although signalling through MEF2D was quantitatively much more significant in activation of Zp. Silencing of Zp during latency is thought to be primarily a consequence of a repressive chromatin structure on Zp, and this aspect of Zp regulation can be observed in the Akata genome through protection of Zp from activation by BZLF1 in the absence of signalling from the B-cell receptor.
The Journal of general virology 2010;91;Pt 3;622-9
Visualizing chromosome mosaicism and detecting ethnic outliers by the method of "rare" heterozygotes and homozygotes (RHH).
Wellcome Trust Sanger Institute, Cambridge, UK. email@example.com
We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify "ethnic outlier" subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of "rare" heterozygotes and/or homozygotes whose frequencies are low (<1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02, 076113
Human molecular genetics 2010;19;13;2539-53
Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. firstname.lastname@example.org
Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species.
Availability: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.
Funded by: Wellcome Trust
Bioinformatics (Oxford, England) 2010;26;16;2069-70
Genome-wide association studies of serum magnesium, potassium, and sodium concentrations identify six Loci influencing serum magnesium levels.
Human Genetics Center and Division of Epidemiology, The University of Texas Health Science Center at Houston, School of Public Health, Houston, Texas, USA.
Magnesium, potassium, and sodium, cations commonly measured in serum, are involved in many physiological processes including energy metabolism, nerve and muscle function, signal transduction, and fluid and blood pressure regulation. To evaluate the contribution of common genetic variation to normal physiologic variation in serum concentrations of these cations, we conducted genome-wide association studies of serum magnesium, potassium, and sodium concentrations using approximately 2.5 million genotyped and imputed common single nucleotide polymorphisms (SNPs) in 15,366 participants of European descent from the international CHARGE Consortium. Study-specific results were combined using fixed-effects inverse-variance weighted meta-analysis. SNPs demonstrating genome-wide significant (p<5 x 10(-8)) or suggestive associations (p<4 x 10(-7)) were evaluated for replication in an additional 8,463 subjects of European descent. The association of common variants at six genomic regions (in or near MUC1, ATP2B1, DCDC5, TRPM6, SHROOM3, and MDS1) with serum magnesium levels was genome-wide significant when meta-analyzed with the replication dataset. All initially significant SNPs from the CHARGE Consortium showed nominal association with clinically defined hypomagnesemia, two showed association with kidney function, two with bone mineral density, and one of these also associated with fasting glucose levels. Common variants in CNNM2, a magnesium transporter studied only in model systems to date, as well as in CNNM3 and CNNM4, were also associated with magnesium concentrations in this study. We observed no associations with serum sodium or potassium levels exceeding p<4 x 10(-7). Follow-up studies of newly implicated genomic loci may provide additional insights into the regulation and homeostasis of human serum magnesium levels.
Funded by: Intramural NIH HHS; NCRR NIH HHS: M01-RR00425, UL1RR025005; NHGRI NIH HHS: U01HG004402; NHLBI NIH HHS: N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-6-4278, R01 HL087652, R01HL087641, U01 HL080295; NIA NIH HHS: N01-AG-12100; NIDDK NIH HHS: DK063491
PLoS genetics 2010;6;8
Transcriptome genetics using second generation sequencing in a Caucasian population.
Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211 Switzerland. email@example.com
Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.
Funded by: Wellcome Trust: 077046
An evaluation of statistical approaches to rare variant analysis in genetic association studies.
Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom. firstname.lastname@example.org
Genome-wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large-scale re-sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population-based genetic studies when data are available from re-sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re-sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits.
Funded by: Wellcome Trust: 064890, 081682, WT081682/Z/06/Z, WT088885/Z/09/Z
Genetic epidemiology 2010;34;2;188-93
Evoker: a visualization tool for genotype intensity data.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
Summary: Genome-wide association studies (GWAS), which produce huge volumes of data, are now being carried out by many groups around the world, creating a need for user-friendly tools for data quality control (QC) and analysis. One critical aspect of GWAS QC is evaluating genotype cluster plots to verify sensible genotype calling in putatively associated single nucleotide polymorphisms (SNPs). Evoker is a tool for visualizing genotype cluster plots, and provides a solution to the computational and storage problems related to working with such large datasets.
Funded by: Wellcome Trust: 089120, WT08912/Z/09/Z
Bioinformatics (Oxford, England) 2010;26;14;1786-7
Methods for Improving Genome Annotation
Knowledge-Based Bioinformatics: From Analysis to Interpretation 2010;Chapter 9;209-32
Interactions of dietary whole-grain intake with fasting glucose- and insulin-related genetic loci in individuals of European descent: a meta-analysis of 14 cohort studies.
Division of Epidemiology, Human Genetics, and Environmental Sciences, University of Texas Health Sciences Center, Houston, Houston, Texas, USA. email@example.com
Objective: Whole-grain foods are touted for multiple health benefits, including enhancing insulin sensitivity and reducing type 2 diabetes risk. Recent genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with fasting glucose and insulin concentrations in individuals free of diabetes. We tested the hypothesis that whole-grain food intake and genetic variation interact to influence concentrations of fasting glucose and insulin.
Research design and methods: Via meta-analysis of data from 14 cohorts comprising ∼ 48,000 participants of European descent, we studied interactions of whole-grain intake with loci previously associated in GWAS with fasting glucose (16 loci) and/or insulin (2 loci) concentrations. For tests of interaction, we considered a P value <0.0028 (0.05 of 18 tests) as statistically significant.
Results: Greater whole-grain food intake was associated with lower fasting glucose and insulin concentrations independent of demographics, other dietary and lifestyle factors, and BMI (β [95% CI] per 1-serving-greater whole-grain intake: -0.009 mmol/l glucose [-0.013 to -0.005], P < 0.0001 and -0.011 pmol/l [ln] insulin [-0.015 to -0.007], P = 0.0003). No interactions met our multiple testing-adjusted statistical significance threshold. The strongest SNP interaction with whole-grain intake was rs780094 (GCKR) for fasting insulin (P = 0.006), where greater whole-grain intake was associated with a smaller reduction in fasting insulin concentrations in those with the insulin-raising allele.
Conclusions: Our results support the favorable association of whole-grain intake with fasting glucose and insulin and suggest a potential interaction between variation in GCKR and whole-grain intake in influencing fasting insulin concentrations.
Funded by: British Heart Foundation: RG/07/008/23674; Medical Research Council: G0100222, G0701863, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003, MC_UP_A620_1015; NHLBI NIH HHS: R01 HL087700; NIA NIH HHS: R01 AG032098
Diabetes care 2010;33;12;2684-91
Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes.
MitoCheck Project Group, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, D-69117 Heidelberg, Germany.
Despite our rapidly growing knowledge about the human genome, we do not know all of the genes required for some of the most basic functions of life. To start to fill this gap we developed a high-throughput phenotypic screening platform combining potent gene silencing by RNA interference, time-lapse microscopy and computational image processing. We carried out a genome-wide phenotypic profiling of each of the approximately 21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes. Phenotypes were scored quantitatively by computational image processing, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival. As part of the Mitocheck consortium, this study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.
Funded by: Wellcome Trust: 077192
Laser excitation power and the flow cytometric resolution of complex karyotypes.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom. firstname.lastname@example.org
The analytical resolution of individual chromosome peaks in the flow karyotype of cell lines is dependent on sample preparation and the detection sensitivity of the flow cytometer. We have investigated the effect of laser power on the resolution of chromosome peaks in cell lines with complex karyotypes. Chromosomes were prepared from a human gastric cancer cell line and a cell line from a patient with an abnormal phenotype using a modified polyamine isolation buffer. The stained chromosome suspensions were analyzed on a MoFlo sorter (Beckman Coulter) equipped with two water-cooled lasers (Coherent). A bivariate flow karyotype was obtained from each of the cell lines at various laser power settings and compared to a karyotype generated using laser power settings of 300 mW. The best separation of chromosome peaks was obtained with laser powers of 300 mW. This study demonstrates the requirement for high-laser powers for the accurate detection and purification of chromosomes, particularly from complex karyotypes, using a conventional flow cytometer.
Funded by: Wellcome Trust: 079643, WT077008
Cytometry. Part A : the journal of the International Society for Analytical Cytology 2010;77;6;585-8
The sudden dominance of blaCTX-M harbouring plasmids in Shigella spp. Circulating in Southern Vietnam.
The Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.
Background: Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges.
Methodology: We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla(CTX-M) encoding plasmid.
Principal findings: We show that two different bla(CTX-M) genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla(CTX-M) gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla(CTX-M-24) gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids.
Significance: The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting.
Funded by: Medical Research Council: G0300020; Wellcome Trust
PLoS neglected tropical diseases 2010;4;6;e702
Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.
Funded by: Wellcome Trust
PLoS genetics 2010;6;4;e1000895
Salmonella enterica serovar Typhimurium mutants completely lacking the F(0)F(1) ATPase are novel live attenuated vaccine strains.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
The F(0)F(1) ATPase plays a central role in both the generation of ATP and the utilisation of ATP for cellular processes such as rotation of bacterial flagella. We have deleted the entire operon encoding the F(0)F(1) ATPase, as well as genes encoding individual F(0) or F(1) subunits, in Salmonella enteric serovar Typhimurium. These mutants were attenuated for virulence, as assessed by bacterial counts in the livers and spleens of intravenously infected mice. The attenuated in vivo growth of the entire atp operon mutant was complemented by the insertion of the atp operon into the malXY pseudogene region. Following clearance of the attenuated mutants from the organs, mice were protected against challenge with the virulent wild type parent strain. We have shown that the F(0)F(1) ATPase is important for bacterial growth in vivo and that atp mutants are effective live attenuated vaccines against Salmonella infection.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/S/N/2006/13095; Wellcome Trust
Synthetic associations in the context of genome-wide association scan signals.
Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester, UK.
Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with complex traits, but these only explain a small proportion of the total heritability. It has been recently proposed that rare variants can create 'synthetic association' signals in GWAS, by occurring more often in association with one of the alleles of a common tag single nucleotide polymorphism. While the ultimate evaluation of this hypothesis will require the completion of large-scale sequencing studies, it is informative to place it in the broader context of what is known about the genetic architecture of complex disease. In this review, we draw from empirical and theoretical data to summarize evidence showing that synthetic associations do not underlie many reported GWAS associations.
Funded by: Wellcome Trust: WT088885/Z/09/Z, WT089120/Z/09/Z
Human molecular genetics 2010;19;R2;R137-44
Dual RMCE for efficient re-engineering of mouse mutant alleles.
Developmental Genetics, Department of Biomedicine, University of Basel, Basel, Switzerland.
We have developed dual recombinase-mediated cassette exchange (dRMCE) to efficiently re-engineer the thousands of available conditional alleles in mouse embryonic stem cells. dRMCE takes advantage of the wild-type loxP and FRT sites present in these conditional alleles and in many gene-trap lines. dRMCE is a scalable, flexible tool to introduce tags, reporters and mutant coding regions into an endogenous locus of interest in an easy and highly efficient manner.
Funded by: Wellcome Trust: 077188
Nature methods 2010;7;11;893-5
Thioredoxin and glutathione systems differ in parasitic and free-living platyhelminths.
Cátedra de Inmunología, Facultad de Química, Instituto de Higiene, Universidad de la República, Avda, A, Navarro 3051, Montevideo, Uruguay.
Background: The thioredoxin and/or glutathione pathways occur in all organisms. They provide electrons for deoxyribonucleotide synthesis, function as antioxidant defenses, in detoxification, Fe/S biogenesis and participate in a variety of cellular processes. In contrast to their mammalian hosts, platyhelminth (flatworm) parasites studied so far, lack conventional thioredoxin and glutathione systems. Instead, they possess a linked thioredoxin-glutathione system with the selenocysteine-containing enzyme thioredoxin glutathione reductase (TGR) as the single redox hub that controls the overall redox homeostasis. TGR has been recently validated as a drug target for schistosomiasis and new drug leads targeting TGR have recently been identified for these platyhelminth infections that affect more than 200 million people and for which a single drug is currently available. Little is known regarding the genomic structure of flatworm TGRs, the expression of TGR variants and whether the absence of conventional thioredoxin and glutathione systems is a signature of the entire platyhelminth phylum.
Results: We examine platyhelminth genomes and transcriptomes and find that all platyhelminth parasites (from classes Cestoda and Trematoda) conform to a biochemical scenario involving, exclusively, a selenium-dependent linked thioredoxin-glutathione system having TGR as a central redox hub. In contrast, the free-living platyhelminth Schmidtea mediterranea (Class Turbellaria) possesses conventional and linked thioredoxin and glutathione systems. We identify TGR variants in Schistosoma spp. derived from a single gene, and demonstrate their expression. We also provide experimental evidence that alternative initiation of transcription and alternative transcript processing contribute to the generation of TGR variants in platyhelminth parasites.
Conclusions: Our results indicate that thioredoxin and glutathione pathways differ in parasitic and free-living flatworms and that canonical enzymes were specifically lost in the parasitic lineage. Platyhelminth parasites possess a unique and simplified redox system for diverse essential processes, and thus TGR is an excellent drug target for platyhelminth infections. Inhibition of the central redox wire hub would lead to overall disruption of redox homeostasis and disable DNA synthesis.
Funded by: FIC NIH HHS: TW006959; NIGMS NIH HHS: GM065204; Wellcome Trust: WT 085775/Z/08/Z
BMC genomics 2010;11;237
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. email@example.com
Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.
Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications.
Availability: The software is available at http://icorn.sourceforge.net
Funded by: Wellcome Trust: WT085775/Z/08/Z
Bioinformatics (Oxford, England) 2010;26;14;1704-7
New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.
Funded by: NIGMS NIH HHS: P50 GM071508; Wellcome Trust: WT 085775/Z/08/Z
Molecular microbiology 2010;76;1;12-24
The RING-CH ligase K5 antagonizes restriction of KSHV and HIV-1 particle release by mediating ubiquitin-dependent endosomal degradation of tetherin.
MRC Centre for Medical Molecular Virology, University College London, London, United Kingdom.
Tetherin (CD317/BST2) is an interferon-induced membrane protein that inhibits the release of diverse enveloped viral particles. Several mammalian viruses have evolved countermeasures that inactivate tetherin, with the prototype being the HIV-1 Vpu protein. Here we show that the human herpesvirus Kaposi's sarcoma-associated herpesvirus (KSHV) is sensitive to tetherin restriction and its activity is counteracted by the KSHV encoded RING-CH E3 ubiquitin ligase K5. Tetherin expression in KSHV-infected cells inhibits viral particle release, as does depletion of K5 protein using RNA interference. K5 induces a species-specific downregulation of human tetherin from the cell surface followed by its endosomal degradation. We show that K5 targets a single lysine (K18) in the cytoplasmic tail of tetherin for ubiquitination, leading to relocalization of tetherin to CD63-positive endosomal compartments. Tetherin degradation is dependent on ESCRT-mediated endosomal sorting, but does not require a tyrosine-based sorting signal in the tetherin cytoplasmic tail. Importantly, we also show that the ability of K5 to substitute for Vpu in HIV-1 release is entirely dependent on K18 and the RING-CH domain of K5. By contrast, while Vpu induces ubiquitination of tetherin cytoplasmic tail lysine residues, mutation of these positions has no effect on its antagonism of tetherin function, and residual tetherin is associated with the trans-Golgi network (TGN) in Vpu-expressing cells. Taken together our results demonstrate that K5 is a mechanistically distinct viral countermeasure to tetherin-mediated restriction, and that herpesvirus particle release is sensitive to this mode of antiviral inhibition.
Funded by: Medical Research Council: G0801172, G0801172(87743), G0801937, G9721629; Wellcome Trust: 076608, WT082274MA
PLoS pathogens 2010;6;4;e1000843
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. firstname.lastname@example.org
The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.
Funded by: Medical Research Council: MC_U105185859; Wellcome Trust
Cell stem cell 2010;6;4;382-95
Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.
Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Korea.
Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3x coverage) and two Asian genomes (AK1, with 27.8x coverage and AK2, with 32.0x coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.
Funded by: NHGRI NIH HHS: HG004221; Wellcome Trust: 077008, 077009, 077014
Nature genetics 2010;42;5;400-5
Using caching and optimization techniques to improve performance of the Ensembl website.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, UK. email@example.com
Background: The Ensembl web site has provided access to genomic information for almost 10 years. During this time the amount of data available through Ensembl has grown dramatically. At the same time, the World Wide Web itself has become a dramatically more important component of the scientific workflow and the way that scientists share and access data and scientific information. Since 2000, the Ensembl web interface has had three major updates and numerous smaller updates. These have largely been in response to expanding data types and valuable representations of existing data types. In 2007 it was realised that a radical new approach would be required in order to serve the project's future requirements, and development therefore focused on identifying suitable web technologies for implementation in the 2008 site redesign.
Results: By comparing the Ensembl website to well-known "Web 2.0" sites, we were able to identify two main areas in which cutting-edge technologies could be advantageously deployed: server efficiency and interface latency. We then evaluated the performance of the existing site using browser-based tools and Apache benchmarking, and selected appropriate technologies to overcome any issues found. Solutions included optimization of the Apache web server, introduction of caching technologies and widespread implementation of AJAX code. These improvements were successfully deployed on the Ensembl website in late 2008 and early 2009.
Conclusions: Web 2.0 technologies provide a flexible and efficient way to access the terabytes of data now available from Ensembl, enhancing the user experience through improved website responsiveness and a rich, interactive interface.
BMC bioinformatics 2010;11;239
A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both A1C and glucose.
Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, Canada. firstname.lastname@example.org
Objective: Glycemia is a major risk factor for the development of long-term complications in type 1 diabetes; however, no specific genetic loci have been identified for glycemic control in individuals with type 1 diabetes. To identify such loci in type 1 diabetes, we analyzed longitudinal repeated measures of A1C from the Diabetes Control and Complications Trial.
Research design and methods: We performed a genome-wide association study using the mean of quarterly A1C values measured over 6.5 years, separately in the conventional (n = 667) and intensive (n = 637) treatment groups of the DCCT. At loci of interest, linear mixed models were used to take advantage of all the repeated measures. We then assessed the association of these loci with capillary glucose and repeated measures of multiple complications of diabetes.
Results: We identified a major locus for A1C levels in the conventional treatment group near SORCS1 (10q25.1, P = 7 x 10(-10)), which was also associated with mean glucose (P = 2 x 10(-5)). This was confirmed using A1C in the intensive treatment group (P = 0.01). Other loci achieved evidence close to genome-wide significance: 14q32.13 (GSC) and 9p22 (BNC2) in the combined treatment groups and 15q21.3 (WDR72) in the intensive group. Further, these loci gave evidence for association with diabetic complications, specifically SORCS1 with hypoglycemia and BNC2 with renal and retinal complications. We replicated the SORCS1 association in Genetics of Diabetes in Kidneys (GoKinD) study control subjects (P = 0.01) and the BNC2 association with A1C in nondiabetic individuals.
Conclusions: A major locus for A1C and glucose in individuals with diabetes is near SORCS1. This may influence the design and analysis of genetic studies attempting to identify risk factors for long-term diabetic complications.
Funded by: Canadian Institutes of Health Research; NIDDK NIH HHS: N01-DK-6-2204, P60-DK20595, R01-DK-077510, R01-DK077489; NIGMS NIH HHS: T32 GM007197
Antagonistic coevolution accelerates molecular evolution.
School of Biological Sciences, Biosciences Building, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK.
The Red Queen hypothesis proposes that coevolution of interacting species (such as hosts and parasites) should drive molecular evolution through continual natural selection for adaptation and counter-adaptation. Although the divergence observed at some host-resistance and parasite-infectivity genes is consistent with this, the long time periods typically required to study coevolution have so far prevented any direct empirical test. Here we show, using experimental populations of the bacterium Pseudomonas fluorescens SBW25 and its viral parasite, phage Phi2 (refs 10, 11), that the rate of molecular evolution in the phage was far higher when both bacterium and phage coevolved with each other than when phage evolved against a constant host genotype. Coevolution also resulted in far greater genetic divergence between replicate populations, which was correlated with the range of hosts that coevolved phage were able to infect. Consistent with this, the most rapidly evolving phage genes under coevolution were those involved in host infection. These results demonstrate, at both the genomic and phenotypic level, that antagonistic coevolution is a cause of rapid and divergent evolution, and is likely to be a major driver of evolutionary change within species.
Funded by: Wellcome Trust
Twenty-eight divergent polysaccharide loci specifying within- and amongst-strain capsule diversity in three strains of Bacteroides fragilis.
Centre for Infection and Immunity, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Medical Biology Centre, 97 Lisburn Road, Belfast BT9 7BL, UK. email@example.com
Comparison of the complete genome sequence of Bacteroides fragilis 638R, originally isolated in the USA, was made with two previously sequenced strains isolated in the UK (NCTC 9343) and Japan (YCH46). The presence of 10 loci containing genes associated with polysaccharide (PS) biosynthesis, each including a putative Wzx flippase and Wzy polymerase, was confirmed in all three strains, despite a lack of cross-reactivity between NCTC 9343 and 638R surface PS-specific antibodies by immunolabelling and microscopy. Genomic comparisons revealed an exceptional level of PS biosynthesis locus diversity. Of the 10 divergent PS-associated loci apparent in each strain, none is similar between NCTC 9343 and 638R. YCH46 shares one locus with NCTC 9343, confirmed by mAb labelling, and a second different locus with 638R, making a total of 28 divergent PS biosynthesis loci amongst the three strains. The lack of expression of the phase-variable large capsule (LC) in strain 638R, observed in NCTC 9343, is likely to be due to a point mutation that generates a stop codon within a putative initiating glycosyltransferase, necessary for the expression of the LC in NCTC 9343. Other major sequence differences were observed to arise from different numbers and variety of inserted extra-chromosomal elements, in particular prophages. Extensive horizontal gene transfer has occurred within these strains, despite the presence of a significant number of divergent DNA restriction and modification systems that act to prevent acquisition of foreign DNA. The level of amongst-strain diversity in PS biosynthesis loci is unprecedented.
Funded by: Wellcome Trust: 061696
Microbiology (Reading, England) 2010;156;Pt 11;3255-69
Genetic evidence that raised sex hormone binding globulin (SHBG) levels reduce the risk of type 2 diabetes.
Genetics of Complex Traits, Peninsula College of Medicine and Dentistry, University of Exeter, Magdalen Road, Exeter, UK.
Epidemiological studies consistently show that circulating sex hormone binding globulin (SHBG) levels are lower in type 2 diabetes patients than non-diabetic individuals, but the causal nature of this association is controversial. Genetic studies can help dissect causal directions of epidemiological associations because genotypes are much less likely to be confounded, biased or influenced by disease processes. Using this Mendelian randomization principle, we selected a common single nucleotide polymorphism (SNP) near the SHBG gene, rs1799941, that is strongly associated with SHBG levels. We used data from this SNP, or closely correlated SNPs, in 27 657 type 2 diabetes patients and 58 481 controls from 15 studies. We then used data from additional studies to estimate the difference in SHBG levels between type 2 diabetes patients and controls. The SHBG SNP rs1799941 was associated with type 2 diabetes [odds ratio (OR) 0.94, 95% CI: 0.91, 0.97; P = 2 x 10(-5)], with the SHBG raising allele associated with reduced risk of type 2 diabetes. This effect was very similar to that expected (OR 0.92, 95% CI: 0.88, 0.96), given the SHBG-SNP versus SHBG levels association (SHBG levels are 0.2 standard deviations higher per copy of the A allele) and the SHBG levels versus type 2 diabetes association (SHBG levels are 0.23 standard deviations lower in type 2 diabetic patients compared to controls). Results were very similar in men and women. There was no evidence that this variant is associated with diabetes-related intermediate traits, including several measures of insulin secretion and resistance. Our results, together with those from another recent genetic study, strengthen evidence that SHBG and sex hormones are involved in the aetiology of type 2 diabetes.
Funded by: Department of Health: DHCS/07/07/008; Intramural NIH HHS; Medical Research Council: G0000649, G016121, G0601261, MC_U106179471; NHGRI NIH HHS: 1 Z01 HG000024; NIA NIH HHS: R01 AG24233-0; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193; Wellcome Trust: 076113, 077016/Z/05/Z, 083270/Z/07/Z, 090532, GR072960
Human molecular genetics 2010;19;3;535-44
The Citrobacter rodentium genome sequence reveals convergent evolution with human pathogenic Escherichia coli.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.
Funded by: Medical Research Council: G0700823
Journal of bacteriology 2010;192;2;525-38
A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.
Funded by: Wellcome Trust
Journal of bacteriology 2010;192;21;5746-54
Metamotifs--a generative model for building families of nucleotide position weight matrices.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. email@example.com
Background: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.
Results: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.
Conclusions: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
Funded by: Wellcome Trust: 077198, 077198/Z/05/Z
BMC bioinformatics 2010;11;348
iMotifs: an integrated sequence motif visualization and analysis environment.
Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. firstname.lastname@example.org
Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.
Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.
Funded by: Wellcome Trust: 077198, 077198/Z/05/Z
Bioinformatics (Oxford, England) 2010;26;6;843-4
A comprehensive catalogue of somatic mutations from a human cancer genome.
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.
Funded by: Wellcome Trust: 077012/Z/05/Z, 088340, 093867
A small-cell lung cancer genome with complex signatures of tobacco exposure.
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.
Funded by: NCI NIH HHS: P50 CA070907, P50CA70907; Wellcome Trust: 077012, 077012/Z/05/Z, 088340, 093867
PARK2 deletions occur frequently in sporadic colorectal cancer and accelerate adenoma development in Apc mutant mice.
Department of Pathology, University of Cambridge, Cambridge CB2 0QQ, United Kingdom.
In 100 primary colorectal carcinomas, we demonstrate by array comparative genomic hybridization (aCGH) that 33% show DNA copy number (DCN) loss involving PARK2, the gene encoding PARKIN, the E3 ubiquitin ligase whose deficiency is responsible for a form of autosomal recessive juvenile parkinsonism. PARK2 is located on chromosome 6 (at 6q25-27), a chromosome with one of the lowest overall frequencies of DNA copy number alterations recorded in colorectal cancers. The PARK2 deletions are mostly focal (31% approximately 0.5 Mb on average), heterozygous, and show maximum incidence in exons 3 and 4. As PARK2 lies within FRA6E, a large common fragile site, it has been argued that the observed DCN losses in PARK2 in cancer may represent merely the result of enforced replication of locally vulnerable DNA. However, we show that deficiency in expression of PARK2 is significantly associated with adenomatous polyposis coli (APC) deficiency in human colorectal cancer. Evidence of some PARK2 mutations and promoter hypermethylation is described. PARK2 overexpression inhibits cell proliferation in vitro. Moreover, interbreeding of Park2 heterozygous knockout mice with Apc(Min) mice resulted in a dramatic acceleration of intestinal adenoma development and increased polyp multiplicity. We conclude that PARK2 is a tumor suppressor gene whose haploinsufficiency cooperates with mutant APC in colorectal carcinogenesis.
Funded by: Cancer Research UK: 12401
Proceedings of the National Academy of Sciences of the United States of America 2010;107;34;15145-50
Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes.
Department of Nutrition, Harvard School of Public Health, and Brigham and Women's Hospital, Boston, MA, USA. email@example.com
To identify type 2 diabetes (T2D) susceptibility loci, we conducted genome-wide association (GWA) scans in nested case-control samples from two prospective cohort studies, including 2591 patients and 3052 controls of European ancestry. Validation was performed in 11 independent GWA studies of 10,870 cases and 73,735 controls. We identified significantly associated variants near RBMS1 and ITGB6 genes at 2q24, best-represented by SNP rs7593730 (combined OR=0.90, 95% CI=0.86-0.93; P=3.7x10(-8)). The frequency of the risk-lowering allele T is 0.23. Variants in this region were nominally related to lower fasting glucose and HOMA-IR in the MAGIC consortium (P<0.05). These data suggest that the 2q24 locus may influence the T2D risk by affecting glucose metabolism and insulin resistance.
Funded by: NCI NIH HHS: CA047988, CA1367 92, CA54281, CA63464, P01CA 089392, P01CA055075, P01CA087969, Z01CP010200; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: U01HG0 04436, U01HG004399, U01HG004402, U01HG004415, U01HG004422, U01HG004423, U01HG004438, U01HG004446, U01HG0047 29, U01HG004726, U01HG004728, U01HG004735, U01HG004738, U01HG04424; NHLBI NIH HHS: HL043851, HL69757, N01- HC-55022, N01-HC- 55018, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55019, N01-HC-55020, N01-HC-55021, N02-HL-6-427, R01 HL071981, R01 HL71981, R01HL086694, R01HL087641, R01HL59367; NIAAA NIH HHS: U10AA008401; NIDA NIH HHS: R01DA013423; NIDCR NIH HHS: U01DE018 993, U01DE018903; NIDDK NIH HHS: DK46200, K01- DK067207, K23 DK65978, K24 DK080140, R01DK058845, R01DK075046, R01DK078616, R90DK071507, T90 DK070078, T90 DK070078-05; NIEHS NIH HHS: T32 ES016645; PHS HHS: HHSN268200625226C, HHSN268200782096C, RFAHG006033
Human molecular genetics 2010;19;13;2706-15
A human gut microbial gene catalogue established by metagenomic sequencing.
BGI-Shenzhen, Shenzhen 518083, China.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, approximately 150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.
PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton-Cambridge CB10 1SA, UK.
Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability.
Funded by: Wellcome Trust: 077186, 079643
Science (New York, N.Y.) 2010;330;6007;1104-7
MEROPS: the peptidase database.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http://merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The database has a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The classification framework is used for attaching information at each level. An important focus of the database has become distinguishing one peptidase from another through identifying the specificity of the peptidase in terms of where it will cleave substrates and with which inhibitors it will interact. We have collected over 39,000 known cleavage sites in proteins, peptides and synthetic substrates. These allow us to display peptidase specificity and alignments of protein substrates to give an indication of how well a cleavage site is conserved, and thus its probable physiological relevance. While the number of new peptidase families and clans has only grown slowly the number of complete genomes has greatly increased. This has allowed us to add an analysis tool to the relevant species pages to show significant gains and losses of peptidase genes relative to related species.
Funded by: Wellcome Trust: WT077044/Z/05/Z
Nucleic acids research 2010;38;Database issue;D227-33
CODA: accurate detection of functional associations between proteins in eukaryotic genomes using domain fusion.
Wellcome Trust Sanger Institute, Cambridge, United Kingdom. email@example.com
Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.
Methodology/principal findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.
Conclusions/significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/.
Funded by: Biotechnology and Biological Sciences Research Council
PloS one 2010;5;6;e10908
Genome-wide association study identifies five loci associated with lung function.
Departments of Health Sciences and Genetics, Adrian Building, University of Leicester, Leicester, UK.
Pulmonary function measures are heritable traits that predict morbidity and mortality and define chronic obstructive pulmonary disease (COPD). We tested genome-wide association with forced expiratory volume in 1 s (FEV(1)) and the ratio of FEV(1) to forced vital capacity (FVC) in the SpiroMeta consortium (n = 20,288 individuals of European ancestry). We conducted a meta-analysis of top signals with data from direct genotyping (n < or = 32,184 additional individuals) and in silico summary association data from the CHARGE Consortium (n = 21,209) and the Health 2000 survey (n < or = 883). We confirmed the reported locus at 4q31 and identified associations with FEV(1) or FEV(1)/FVC and common variants at five additional loci: 2q35 in TNS1 (P = 1.11 x 10(-12)), 4q24 in GSTCD (2.18 x 10(-23)), 5q33 in HTR4 (P = 4.29 x 10(-9)), 6p21 in AGER (P = 3.07 x 10(-15)) and 15q23 in THSD4 (P = 7.24 x 10(-15)). mRNA analyses showed expression of TNS1, GSTCD, AGER, HTR4 and THSD4 in human lung tissue. These associations offer mechanistic insight into pulmonary function regulation and indicate potential targets for interventions to alleviate respiratory disease.
Funded by: Biotechnology and Biological Sciences Research Council; British Heart Foundation: PG/06/154/22043, PG/97012, RG/08/013/25942; Cancer Research UK; Chief Scientist Office: CZB/4/710, CZD/16/6/2, CZD/16/6/4; Department of Health: 0020029; Medical Research Council: G0000934, G0000943, G0401540, G0500539, G0501942, G0600331, G0600705, G0800582, G0801056, G0902125, G9815508, G990146, MC_U106179471, MC_U106188470, MC_U123092720, MC_U123092721, MC_U127561128, MC_UP_A620_1014, U.1230.00.008.00005.02; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01, 5R01MH63706:02; Wellcome Trust: 068545/Z/02, 075883, 076113/B/04/Z, 077016/Z/05/Z, 079895, 086160/Z/08/A
Nature genetics 2010;42;1;36-44
Using randomised vectors in transcription factor binding site predictions
Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010;5708880
A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses.
Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland. samuli.ripatti@fi mm.fi
Background: Comparison of patients with coronary heart disease and controls in genome-wide association studies has revealed several single nucleotide polymorphisms (SNPs) associated with coronary heart disease. We aimed to establish the external validity of these findings and to obtain more precise risk estimates using a prospective cohort design.
Methods: We tested 13 recently discovered SNPs for association with coronary heart disease in a case-control design including participants differing from those in the discovery samples (3829 participants with prevalent coronary heart disease and 48,897 controls free of the disease) and a prospective cohort design including 30,725 participants free of cardiovascular disease from Finland and Sweden. We modelled the 13 SNPs as a multilocus genetic risk score and used Cox proportional hazards models to estimate the association of genetic risk score with incident coronary heart disease. For case-control analyses we analysed associations between individual SNPs and quintiles of genetic risk score using logistic regression.
Findings: In prospective cohort analyses, 1264 participants had a first coronary heart disease event during a median 10·7 years' follow-up (IQR 6·7-13·6). Genetic risk score was associated with a first coronary heart disease event. When compared with the bottom quintile of genetic risk score, participants in the top quintile were at 1·66-times increased risk of coronary heart disease in a model adjusting for traditional risk factors (95% CI 1·35-2·04, p value for linear trend=7·3×10(-10)). Adjustment for family history did not change these estimates. Genetic risk score did not improve C index over traditional risk factors and family history (p=0·19), nor did it have a significant effect on net reclassification improvement (2·2%, p=0·18); however, it did have a small effect on integrated discrimination index (0·004, p=0·0006). Results of the case-control analyses were similar to those of the prospective cohort analyses.
Interpretation: Using a genetic risk score based on 13 SNPs associated with coronary heart disease, we can identify the 20% of individuals of European ancestry who are at roughly 70% increased risk of a first coronary heart disease event. The potential clinical use of this panel of SNPs remains to be defined.
Funding: The Wellcome Trust; Academy of Finland Center of Excellence for Complex Disease Genetics; US National Institutes of Health; the Donovan Family Foundation.
Funded by: NHLBI NIH HHS: (R01 HL087676; Wellcome Trust: WT089061/Z/09/Z, WT089062/Z/09/Z
Lancet (London, England) 2010;376;9750;1393-400
Genomic architecture characterizes tumor progression paths and fate in breast cancer patients.
Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway.
Distinct molecular subtypes of breast carcinomas have been identified, but translation into clinical use has been limited. We have developed two platform-independent algorithms to explore genomic architectural distortion using array comparative genomic hybridization data to measure (i) whole-arm gains and losses [whole-arm aberration index (WAAI)] and (ii) complex rearrangements [complex arm aberration index (CAAI)]. By applying CAAI and WAAI to data from 595 breast cancer patients, we were able to separate the cases into eight subgroups with different distributions of genomic distortion. Within each subgroup data from expression analyses, sequencing and ploidy indicated that progression occurs along separate paths into more complex genotypes. Histological grade had prognostic impact only in the luminal-related groups, whereas the complexity identified by CAAI had an overall independent prognostic power. This study emphasizes the relation among structural genomic alterations, molecular subtype, and clinical behavior and shows that objective score of genomic complexity (CAAI) is an independent prognostic marker in breast cancer.
Funded by: Cancer Research UK: C507/A3086
Science translational medicine 2010;2;38;38ra47
Three authors reply
American Journal of Epidemiology. 2010;171;1155-6
Association of the 9p21.3 locus with risk of first-ever myocardial infarction in Pakistanis: case-control study in South Asia and updated meta-analysis of Europeans.
Center for Non-Communicable Diseases, Karachi, Pakistan. firstname.lastname@example.org
Objective: To examine variants at the 9p21 locus in a case-control study of acute myocardial infarction (MI) in Pakistanis and to perform an updated meta-analysis of published studies in people of European ancestry.
Methods and results: A total of 1851 patients with first-ever confirmed MI and 1903 controls were genotyped for 89 tagging single-nucleotide polymorphisms at locus 9p21, including the lead variant (rs1333049) identified by the Wellcome Trust Case Control Consortium. Minor allele frequencies and extent of linkage disequilibrium observed in Pakistanis were broadly similar to those seen in Europeans. In the Pakistani study, 6 variants were associated with MI (P<10(-2)) in the initial sample set, and in an additional 741 cases and 674 controls in whom further genotyping was performed for these variants. For Pakistanis, the odds ratio for MI was 1.13 (95% CI, 1.05 to 1.22; P=2 x 10(-3)) for each copy of the C allele at rs1333049. In comparison, a meta-analysis of studies in Europeans yielded an odds ratio of 1.31 (95% CI, 1.26 to 1.37) for the same variant (P=1 x 10(-3) for heterogeneity). Meta-analyses of 23 variants, in up to 38,250 cases and 84,820 controls generally yielded higher values in Europeans than in Pakistanis.
Conclusions: To our knowledge, this study provides the first demonstration that variants at the 9p21 locus are significantly associated with MI risk in Pakistanis. However, association signals at this locus were weaker in Pakistanis than those in European studies.
Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust
Arteriosclerosis, thrombosis, and vascular biology 2010;30;7;1467-73
Genetic determinants of major blood lipids in Pakistanis compared with Europeans.
Center for Non-Communicable Diseases Karachi, Pakistan. email@example.com
Background: Evidence is sparse about the genetic determinants of major lipids in Pakistanis.
Variants (n=45 000) across 2000 genes were assessed in 3200 Pakistanis and compared with 2450 Germans using the same gene array and similar lipid assays. We also did a meta-analysis of selected lipid-related variants in Europeans. Pakistani genetic architecture was distinct from that of several ethnic groups represented in international reference samples. Forty-one variants at 14 loci were significantly associated with levels of HDL-C, triglyceride, or LDL-C. The most significant lipid-related variants identified among Pakistanis corresponded to genes previously shown to be relevant to Europeans, such as CETP associated with HDL-C levels (rs711752; P<10(-13)), APOA5/ZNF259 (rs651821; P<10(-13)) and GCKR (rs1260326; P<10(-13)) with triglyceride levels; and CELSR2 variants with LDL-C levels (rs646776; P<10(-9)). For Pakistanis, these 41 variants explained 6.2%, 7.1%, and 0.9% of the variation in HDL-C, triglyceride, and LDL-C, respectively. Compared with Europeans, the allele frequency of rs662799 in APOA5 among Pakistanis was higher and its impact on triglyceride concentration was greater (P-value for difference <10(-4)).
Conclusions: Several lipid-related genetic variants are common to Pakistanis and Europeans, though they explain only a modest proportion of population variation in lipid concentration. Allelic frequencies and effect sizes of lipid-related variants can differ between Pakistanis and Europeans.
Funded by: British Heart Foundation; Medical Research Council; Wellcome Trust
Circulation. Cardiovascular genetics 2010;3;4;348-57
Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge.
Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
Glucose levels 2 h after an oral glucose challenge are a clinical measure of glucose tolerance used in the diagnosis of type 2 diabetes. We report a meta-analysis of nine genome-wide association studies (n = 15,234 nondiabetic individuals) and a follow-up of 29 independent loci (n = 6,958-30,620). We identify variants at the GIPR locus associated with 2-h glucose level (rs10423928, beta (s.e.m.) = 0.09 (0.01) mmol/l per A allele, P = 2.0 x 10(-15)). The GIPR A-allele carriers also showed decreased insulin secretion (n = 22,492; insulinogenic index, P = 1.0 x 10(-17); ratio of insulin to glucose area under the curve, P = 1.3 x 10(-16)) and diminished incretin effect (n = 804; P = 4.3 x 10(-4)). We also identified variants at ADCY5 (rs2877716, P = 4.2 x 10(-16)), VPS13C (rs17271305, P = 4.1 x 10(-8)), GCKR (rs1260326, P = 7.1 x 10(-11)) and TCF7L2 (rs7903146, P = 4.2 x 10(-10)) associated with 2-h glucose. Of the three newly implicated loci (GIPR, ADCY5 and VPS13C), only ADCY5 was found to be associated with type 2 diabetes in collaborating studies (n = 35,869 cases, 89,798 controls, OR = 1.12, 95% CI 1.09-1.15, P = 4.8 x 10(-18)).
Funded by: British Heart Foundation: RG/07/008/23674; Chief Scientist Office: CZB/4/710; Intramural NIH HHS: Z01 HG000024-14; Medical Research Council: G0100222, G0600331, G0701863, G0902037, G19/35, G8802774, MC_U106179471, MC_U106188470, MC_UP_A620_1014, MC_UP_A620_1015; NCI NIH HHS: P01 CA087969, P01 CA087969-12; NCRR NIH HHS: M01 RR000052, M01 RR000052-46, M01 RR001066-26, M01 RR016500, M01 RR016500-08; NHGRI NIH HHS: U01 HG004399, U01 HG004399-02, U01 HG004402, U01 HG004402-02; NHLBI NIH HHS: N01 HC015103, N01 HC025195, N01 HC035129, N01 HC045133, N01 HC055015, N01 HC055016, N01 HC055018, N01 HC055019, N01 HC055020, N01 HC055021, N01 HC055022, N01 HC055222, N01 HC075150, N01 HC085079, N01 HC085080, N01 HC085081, N01 HC085082, N01 HC085083, N01 HC085084, N01 HC085085, N01 HC085086, N02 HL64278, R01 HL036310, R01 HL036310-21, R01 HL059367, R01 HL059367-10, R01 HL086694, R01 HL086694-03, R01 HL087641, R01 HL087641-03, R01 HL087652, R01 HL087652-03, U01 HL072515, U01 HL072515-06, U01 HL080295, U01 HL080295-04; NIA NIH HHS: R01 AG013196, R01 AG013196-16; NIDA NIH HHS: U54 DA021519, U54 DA021519-04; NIDDK NIH HHS: K23 DK065978, K23 DK065978-05, K24 DK080140, K24 DK080140-04, P30 DK072488, P30 DK072488-06, P30 DK079637, P60 DK079637, P60 DK079637-04, R01 DK029867, R01 DK054261, R01 DK054261-09, R01 DK058845, R01 DK058845-11, R01 DK062370, R01 DK062370-05, R01 DK069922, R01 DK069922-03, R01 DK072193, R01 DK072193-04, R01 DK078616, R01 DK078616-03, R01 DK091718; Wellcome Trust: 077016, 088885
Nature genetics 2010;42;2;142-8
Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
Transcription factors (TFs) direct gene expression by binding to DNA regulatory regions. To explore the evolution of gene regulation, we used chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) to determine experimentally the genome-wide occupancy of two TFs, CCAAT/enhancer-binding protein alpha and hepatocyte nuclear factor 4 alpha, in the livers of five vertebrates. Although each TF displays highly conserved DNA binding preferences, most binding is species-specific, and aligned binding events present in all five species are rare. Regions near genes with expression levels that are dependent on a TF are often bound by the TF in multiple species yet show no enhanced DNA sequence constraint. Binding divergence between species can be largely explained by sequence changes to the bound motifs. Among the binding events lost in one lineage, only half are recovered by another binding event within 10 kilobases. Our results reveal large interspecies differences in transcriptional regulation and provide insight into regulatory evolution.
Funded by: Cancer Research UK: 15603, A15603; European Research Council: 202218; Wellcome Trust: 062023, 079643, WT062023, WT079643
Science (New York, N.Y.) 2010;328;5981;1036-40
CHD7 targets active gene enhancer elements to modulate ES cell-specific gene expression.
Department of Genetics, Case Western Reserve University, Cleveland, Ohio, United States of America.
CHD7 is one of nine members of the chromodomain helicase DNA-binding domain family of ATP-dependent chromatin remodeling enzymes found in mammalian cells. De novo mutation of CHD7 is a major cause of CHARGE syndrome, a genetic condition characterized by multiple congenital anomalies. To gain insights to the function of CHD7, we used the technique of chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-Seq) to map CHD7 sites in mouse ES cells. We identified 10,483 sites on chromatin bound by CHD7 at high confidence. Most of the CHD7 sites show features of gene enhancer elements. Specifically, CHD7 sites are predominantly located distal to transcription start sites, contain high levels of H3K4 mono-methylation, found within open chromatin that is hypersensitive to DNase I digestion, and correlate with ES cell-specific gene expression. Moreover, CHD7 co-localizes with P300, a known enhancer-binding protein and strong predictor of enhancer activity. Correlations with 18 other factors mapped by ChIP-seq in mouse ES cells indicate that CHD7 also co-localizes with ES cell master regulators OCT4, SOX2, and NANOG. Correlations between CHD7 sites and global gene expression profiles obtained from Chd7(+/+), Chd7(+/-), and Chd7(-/-) ES cells indicate that CHD7 functions at enhancers as a transcriptional rheostat to modulate, or fine-tune the expression levels of ES-specific genes. CHD7 can modulate genes in either the positive or negative direction, although negative regulation appears to be the more direct effect of CHD7 binding. These data indicate that enhancer-binding proteins can limit gene expression and are not necessarily co-activators. Although ES cells are not likely to be affected in CHARGE syndrome, we propose that enhancer-mediated gene dysregulation contributes to disease pathogenesis and that the critical CHD7 target genes may be subject to positive or negative regulation.
Funded by: Medical Research Council: G0800024, MC_U120027516; NHGRI NIH HHS: 1U54HG004557-01, R01 HG004456-01, R01HG003521-01, R01HG004722; NICHD NIH HHS: R01HD056369, T32 HD007104
PLoS genetics 2010;6;7;e1001023
An investigation of clinical and immunological events following repeated aerodigestive tract challenge infections with live Mycobacterium bovis Bacille Calmette Guérin.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom. firstname.lastname@example.org
Bacille Calmette Guérin substrain Moreau Rio de Janeiro is an attenuated strain of Mycobacterium bovis that has been used extensively as an oral tuberculosis vaccine. We assessed its potential as a challenge model to study clinical and immunological events following repeated mycobacterial gut infection. Seven individuals received three oral challenges with approximately 10(7) viable bacilli. Clinical symptoms, T-cell responses and gene expression patterns in peripheral blood were monitored. Clinical symptoms were relatively mild and declined following each oral challenge. Delayed T-cell responses were observed, and limited differential gene expression detected by microarrays. Oral challenge with BCG Moreau Rio de Janeiro vaccine was immunogenic in healthy volunteers, limiting its potential to explore clinical innate immune responses, but with low reactogenicity.
Funded by: Wellcome Trust
Legionella pneumophila strain 130b possesses a unique combination of type IV secretion systems and novel Dot/Icm secretion system effector proteins.
Centre for Molecular Microbiology and Infection, Division of Cell and Molecular Biology, Imperial College, London, United Kingdom.
Legionella pneumophila is a ubiquitous inhabitant of environmental water reservoirs. The bacteria infect a wide variety of protozoa and, after accidental inhalation, human alveolar macrophages, which can lead to severe pneumonia. The capability to thrive in phagocytic hosts is dependent on the Dot/Icm type IV secretion system (T4SS), which translocates multiple effector proteins into the host cell. In this study, we determined the draft genome sequence of L. pneumophila strain 130b (Wadsworth). We found that the 130b genome encodes a unique set of T4SSs, namely, the Dot/Icm T4SS, a Trb-1-like T4SS, and two Lvh T4SS gene clusters. Sequence analysis substantiated that a core set of 107 Dot/Icm T4SS effectors was conserved among the sequenced L. pneumophila strains Philadelphia-1, Lens, Paris, Corby, Alcoy, and 130b. We also identified new effector candidates and validated the translocation of 10 novel Dot/Icm T4SS effectors that are not present in L. pneumophila strain Philadelphia-1. We examined the prevalence of the new effector genes among 87 environmental and clinical L. pneumophila isolates. Five of the new effectors were identified in 34 to 62% of the isolates, while less than 15% of the strains tested positive for the other five genes. Collectively, our data show that the core set of conserved Dot/Icm T4SS effector proteins is supplemented by a variable repertoire of accessory effectors that may partly account for differences in the virulences and prevalences of particular L. pneumophila strains.
Funded by: Medical Research Council: G0700823; Wellcome Trust
Journal of bacteriology 2010;192;22;6001-16
Natural history of Christianson syndrome.
Greenwood Genetic Center, Greenwood, South Carolina, USA. email@example.com
Christianson syndrome is an X-linked mental retardation syndrome characterized by microcephaly, impaired ocular movement, severe global developmental delay, hypotonia which progresses to spasticity, and early onset seizures of variable types. Gilfillan et al.2008] reported mutations in SLC9A6, the gene encoding the sodium/hydrogen exchanger NHE6, in the family first reported and in three others. They also noted the clinical similarities to Angelman syndrome and found cerebellar atrophy on MRI and elevated glutamate/glutamine in the basal ganglia on MRS. Here we report on nonsense mutations in two additional families. The natural history is detailed in childhood and adult life, the similarities to Angelman syndrome confirmed, and the MRI/MRS findings documented in three affected boys.
American journal of medical genetics. Part A 2010;152A;11;2775-83
Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946.
The Sanger Institute, Hinxton, Cambridge, United Kingdom.
Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related enterobacteria revealed signs of pathoadaptation to rosaceous hosts.
Journal of bacteriology 2010;192;7;2020-1
Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits.
Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA's performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and approximately 1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.
PLoS genetics 2010;6;8
A worldwide survey of human male demographic history based on Y-SNP and Y-STR data from the HGDP-CEPH populations.
The Wellcome Trust Sanger Institute, Hinxton, Cambs., United Kingdom.
We have investigated human male demographic history using 590 males from 51 populations in the Human Genome Diversity Project - Centre d'Etude du Polymorphisme Humain worldwide panel, typed with 37 Y-chromosomal Single Nucleotide Polymorphisms and 65 Y-chromosomal Short Tandem Repeats and analyzed with the program Bayesian Analysis of Trees With Internal Node Generation. The general patterns we observe show a gradient from the oldest population time to the most recent common ancestors (TMRCAs) and expansion times together with the largest effective population sizes in Africa, to the youngest times and smallest effective population sizes in the Americas. These parameters are significantly negatively correlated with distance from East Africa, and the patterns are consistent with most other studies of human variation and history. In contrast, growth rate showed a weaker correlation in the opposite direction. Y-lineage diversity and TMRCA also decrease with distance from East Africa, supporting a model of expansion with serial founder events starting from this source. A number of individual populations diverge from these general patterns, including previously documented examples such as recent expansions of the Yoruba in Africa, Basques in Europe, and Yakut in Northern Asia. However, some unexpected demographic histories were also found, including low growth rates in the Hazara and Kalash from Pakistan and recent expansion of the Mozabites in North Africa.
Molecular biology and evolution 2010;27;2;385-93
Copy number variant detection in inbred strains from short read sequence data.
Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.
Summary: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets.
Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust
Bioinformatics (Oxford, England) 2010;26;4;565-7
Floxin, a resource for genetically engineering mouse ESCs.
Department of Biochemistry and Biophysics, Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.
We describe a method for the highly efficient and precise targeted modification of gene trap loci in mouse embryonic stem cells (ESCs). Through the Floxin method, gene trap mutations were reverted and new DNA sequences inserted using Cre recombinase and a shuttle vector, pFloxin. Floxin technology is applicable to the existing collection of 24,149 compatible gene trap cell lines, which should enable high-throughput modification of many genes in mouse ESCs.
Funded by: NIAMS NIH HHS: R01 AR054396, R01 AR054396-01A1, R01AR054396
Nature methods 2010;7;1;50-2
Family history of premature coronary heart disease and risk prediction in the EPIC-Norfolk prospective population study.
Department of Vascular Medicine, Academic Medical Center, Amsterdam, The Netherlands.
Objective: The value of a family history for coronary heart disease (CHD) in addition to established cardiovascular risk factors in predicting an individual's risk of CHD is unclear. In the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort, the authors tested whether adding family history of premature CHD in first-degree relatives improves risk prediction compared with the Framingham risk score (FRS) alone.
This study comprised 10,288 men and 12,553 women aged 40-79 years participating in the EPIC-Norfolk cohort who were followed for a mean of 10.9±2.1 years (mean±SD). The authors computed the FRS as well as a modified score taking into account family history of premature CHD. A family history of CHD was indeed associated with an increased risk of future CHD, independent of established risk factors (FRS-adjusted HR of 1.74 (95% CI 1.56 to 1.95) for family history of premature CHD). However, adding family history of CHD to the FRS resulted in a negative net reclassification of 2%. In the subgroup of individuals estimated to be at intermediate risk, family history of premature CHD resulted in an increase in net reclassification of 2%. The sensitivity increased with 0.4%, and the specificity decreased 0.8%.
Conclusion: Although family history of CHD was an independent risk factor of future CHD, its use did not improve classification of individuals into clinically relevant risk categories based on the FRS. Among study participants at intermediate risk of CHD, adding family history of premature CHD resulted in, at best, a modest improvement in reclassification of individuals into a more accurate risk category.
Funded by: Cancer Research UK; Medical Research Council
Heart (British Cardiac Society) 2010;96;24;1985-9
Neuronal MeCP2 is expressed at near histone-octamer levels and globally alters the chromatin state.
Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK.
MeCP2 is a nuclear protein with an affinity for methylated DNA that can recruit histone deacetylases. Deficiency or excess of MeCP2 causes severe neurological problems, suggesting that the number of molecules per cell must be precisely regulated. We quantified MeCP2 in neuronal nuclei and found that it is nearly as abundant as the histone octamer. Despite this high abundance, MeCP2 associates preferentially with methylated regions, and high-throughput sequencing showed that its genome-wide binding tracks methyl-CpG density. MeCP2 deficiency results in global changes in neuronal chromatin structure, including elevated histone acetylation and a doubling of histone H1. Neither change is detectable in glia, where MeCP2 occurs at lower levels. The mutant brain also shows elevated transcription of repetitive elements. Our data argue that MeCP2 may not act as a gene-specific transcriptional repressor in neurons, but might instead dampen transcriptional noise genome-wide in a DNA methylation-dependent manner.
Funded by: Wellcome Trust: 077224, 079643
Molecular cell 2010;37;4;457-68
Constitutional translocation breakpoint mapping by genome-wide paired-end sequencing identifies HACE1 as a putative Wilms tumour susceptibility gene.
Section Chair and Professor of Human Genetics, The Institute of Cancer Research, 15 Cotswold Road, Sutton SM2 5NG, UK.
Background: Localisation of the breakpoints of chromosomal translocations has aided the discovery of several disease genes but has traditionally required laborious investigation of chromosomes by fluorescent in situ hybridisation approaches. Here, a strategy that utilises genome-wide paired-end massively parallel DNA sequencing to rapidly map translocation breakpoints is reported. This method was used to fine map a de novo t(5;6)(q21;q21) translocation in a child with bilateral, young-onset Wilms tumour.
Methods and results: Genome-wide paired-end sequencing was performed for approximately 6 million randomly generated approximately 3 kb fragments from constitutional DNA containing the translocation, and six fragments in which one end mapped to chromosome 5 and the other to chromosome 6 were identified. This mapped the translocation breakpoints to within 1.7 kb. Then, PCR assays that amplified across the rearrangement junction were designed to characterise the breakpoints at sequence-level resolution. The 6q21 breakpoint transects and truncates HACE1, an E3 ubiquitin-protein ligase that has been implicated as a somatically inactivated target in Wilms tumourigenesis. To evaluate the contribution of HACE1 to Wilms tumour predisposition, the gene was mutationally screened in 450 individuals with Wilms tumour. One child with unilateral Wilms tumour and a truncating HACE1 mutation was identified.
Conclusions: These data indicate that constitutional disruption of HACE1 likely predisposes to Wilms tumour. However, HACE1 mutations are rare and therefore can only make a small contribution to Wilms tumour incidence. More broadly, this study demonstrates the utility of genome-wide paired-end sequencing in the delineation of apparently balanced chromosomal translocations, for which it is likely to become the method of choice.
Funded by: Cancer Research UK: 11886, 9024, C8620_A8857, C8620_A9024
Journal of medical genetics 2010;47;5;342-7
Common variants at 10 genomic loci influence hemoglobin A₁(C) levels via glycemic and nonglycemic pathways.
Human Genetics, Wellcome Trust Sanger Institute, Hinxton, U.K.
Objective: Glycated hemoglobin (HbA₁(c)), used to monitor and diagnose diabetes, is influenced by average glycemia over a 2- to 3-month period. Genetic factors affecting expression, turnover, and abnormal glycation of hemoglobin could also be associated with increased levels of HbA₁(c). We aimed to identify such genetic factors and investigate the extent to which they influence diabetes classification based on HbA₁(c) levels.
Research design and methods: We studied associations with HbA₁(c) in up to 46,368 nondiabetic adults of European descent from 23 genome-wide association studies (GWAS) and 8 cohorts with de novo genotyped single nucleotide polymorphisms (SNPs). We combined studies using inverse-variance meta-analysis and tested mediation by glycemia using conditional analyses. We estimated the global effect of HbA₁(c) loci using a multilocus risk score, and used net reclassification to estimate genetic effects on diabetes screening.
Results: Ten loci reached genome-wide significant association with HbA(1c), including six new loci near FN3K (lead SNP/P value, rs1046896/P = 1.6 × 10⁻²⁶), HFE (rs1800562/P = 2.6 × 10⁻²⁰), TMPRSS6 (rs855791/P = 2.7 × 10⁻¹⁴), ANK1 (rs4737009/P = 6.1 × 10⁻¹²), SPTA1 (rs2779116/P = 2.8 × 10⁻⁹) and ATP11A/TUBGCP3 (rs7998202/P = 5.2 × 10⁻⁹), and four known HbA₁(c) loci: HK1 (rs16926246/P = 3.1 × 10⁻⁵⁴), MTNR1B (rs1387153/P = 4.0 × 10⁻¹¹), GCK (rs1799884/P = 1.5 × 10⁻²⁰) and G6PC2/ABCB11 (rs552976/P = 8.2 × 10⁻¹⁸). We show that associations with HbA₁(c) are partly a function of hyperglycemia associated with 3 of the 10 loci (GCK, G6PC2 and MTNR1B). The seven nonglycemic loci accounted for a 0.19 (% HbA₁(c)) difference between the extreme 10% tails of the risk score, and would reclassify ∼2% of a general white population screened for diabetes with HbA₁(c).
Conclusions: GWAS identified 10 genetic loci reproducibly associated with HbA₁(c). Six are novel and seven map to loci where rarer variants cause hereditary anemias and iron storage disorders. Common variants at these loci likely influence HbA₁(c) levels via erythrocyte biology, and confer a small but detectable reclassification of diabetes diagnosis by HbA₁(c).
Funded by: Chief Scientist Office: CZB/4/710; Medical Research Council: G0401527, G0701863, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128, MC_UP_A100_1003; NIDDK NIH HHS: R01 DK072193
Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index.
Metabolism Initiative and Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA.
Obesity is globally prevalent and highly heritable, but its underlying genetic factors remain largely elusive. To identify genetic loci for obesity susceptibility, we examined associations between body mass index and ∼ 2.8 million SNPs in up to 123,865 individuals with targeted follow up of 42 SNPs in up to 125,931 additional individuals. We confirmed 14 known obesity susceptibility loci and identified 18 new loci associated with body mass index (P < 5 × 10⁻⁸), one of which includes a copy number variant near GPRC5B. Some loci (at MC4R, POMC, SH2B1 and BDNF) map near key hypothalamic regulators of energy balance, and one of these loci is near GIPR, an incretin receptor. Furthermore, genes in other newly associated loci may provide new insights into human body weight regulation.
Funded by: British Heart Foundation; Cancer Research UK; Chief Scientist Office: CZB/4/710; Department of Health; Medical Research Council: G0000934, G0401527, G0501184, G0600331, G0600705, G0601261, G0701863, G0801056, G0900554, G9521010, G9824984, MC_QA137934, MC_U106179471, MC_U106179472, MC_U106188470, MC_U127561128, MC_U137686857; NCI NIH HHS: CA047988, CA49449, CA50385, CA65725, CA67262, CA87969, U01-CA098233; NCRR NIH HHS: M01-RR00425, U54-RR020278, UL1-RR025005; NHGRI NIH HHS: HG002651, N01-HG-65403, T32 HG000040, T32 HG000040-17, T32-HG00040, U01-HG004399, U01-HG004402, Z01-HG000024; NHLBI NIH HHS: HL084729, HL71981, K99-HL094535, N01-HC15103, N01-HC25195, N01-HC35129, N01-HC45133, N01-HC55015, N01-HC55016, N01-HC55018, N01-HC55019, N01-HC55020, N01-HC55022, N01-HC55222, N01-HC75150, N01-HC85079, N01-HC85080, N01-HC85081, N01-HC85082, N01-HC85083, N01-HC85084, N01-HC85085, N01-HC85086, N01-N01HC-55021, N02-HL64278, R01 HL071981, R01 HL087647, R01-HL086694, R01-HL087641, R01-HL087647, R01-HL087652, R01-HL087676, R01-HL087679, R01-HL087700, R01-HL088119, R01-HL59367, U01 HL054527, U01-HL080295, U01-HL084756, U01-HL72515; NIA NIH HHS: N01-AG12100, N01-AG12109, R01-AG031890; NIAAA NIH HHS: AA014041, AA07535, AA10248, AA13320, AA13321, AA13326, K05 AA017688; NIAMS NIH HHS: K08 AR055688, K08 AR055688-03, K08 AR055688-04; NIDA NIH HHS: DA12854, R01 DA012854; NIDDK NIH HHS: DK062370, DK063491, DK072193, DK46200, DK58845, F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, K23-DK080145, P30-DK072488, R01 DK072193, R01 DK072193-05, R01-DK073490, R01-DK075787, R01DK068336, R01DK075681, U01 DK062370, U01 DK062370-08, U01-DK062418; NIGMS NIH HHS: U01-GM074518; NIMH NIH HHS: MH084698, R01-MH59160, R01-MH59565, R01-MH59566, R01-MH59571, R01-MH59586, R01-MH59587, R01-MH59588, R01-MH60870, R01-MH60879, R01-MH61675, R01-MH63706, R01-MH67257, R01-MH79469, R01-MH79470, R01-MH81800, RL1-MH083268; PHS HHS: 263-MA-410953; Wellcome Trust: 064890, 068545, 072960, 075491, 076113, 077016, 079557, 079895, 081682, 083270, 085301, 086596, 090532
Nature genetics 2010;42;11;937-48
Embryos, Genes and Birth Defects 2010;Chapter 10;231-62
Pooled analysis indicates that the GSTT1 deletion, GSTM1 deletion, and GSTP1 Ile105Val polymorphisms do not modify breast cancer risk in BRCA1 and BRCA2 mutation carriers.
Division of Genetics and Population Health, Queensland Institute of Medical Research, 300 Herston Rd, Herston 4006, Australia.
The GSTP1, GSTM1, and GSTT1 detoxification genes all have functional polymorphisms that are common in the general population. A single study of 320 BRCA1/2 carriers previously assessed their effect in BRCA1 or BRCA2 mutation carriers. This study showed no evidence for altered risk of breast cancer for individuals with the GSTT1 and GSTM1 deletion variants, but did report that the GSTP1 Ile105Val (rs1695) variant was associated with increased breast cancer risk in carriers. We investigated the association between these three GST polymorphisms and breast cancer risk using existing data from 718 women BRCA1 and BRCA2 mutation carriers from Australia, the UK, Canada, and the USA. Data were analyzed within a proportional hazards framework using Cox regression. There was no evidence to show that any of the polymorphisms modified disease risk for BRCA1 or BRCA2 carriers, and there was no evidence for heterogeneity between sites. These results support the need for replication studies to confirm or refute hypothesis-generating studies.
Funded by: Canadian Institutes of Health Research; Cancer Research UK: 10118, 11174, C1287/A10118, C1287/A8874; NCI NIH HHS: R01 CA083855, R01 CA083855-01, R01 CA083855-02, R01 CA083855-03, R01 CA083855-04, R01 CA083855-05, R01 CA083855-06, R01 CA083855-07, R01 CA083855-08, R01 CA083855-09, R01 CA083855-10, R01 CA083855-11, R01 CA102776, R01 CA102776-01A1, R01 CA102776-02, R01 CA102776-03, R01 CA102776-04, R01 CA102776-05, R01-CA083855, R01-CA102776
Breast cancer research and treatment 2010;122;1;281-5
Low-density lipoprotein receptor-related protein 5 polymorphisms are associated with bone mineral density in Greek postmenopausal women: an interaction with calcium intake.
Department of Dietetics and Nutrition, Harokopio University, 17671 Athens, Greece.
The low-density lipoprotein receptor-related protein 5 (LRP5) has been shown to play a significant role in bone biology. This study aimed to assess the association of four common polymorphisms of the LRP5 gene with bone mineral density (BMD) and possible genexcalcium intake interactions in Greek postmenopausal women. For this observational cross-sectional association study, healthy postmenopausal women (N=578) were recruited (between December 2006 and January 2008) and genotyped for four polymorphisms (rs1784235, rs491347, rs4988321, and rs4988330) in the LRP5 gene. Measurements of BMD were performed and detailed medical, dietary, and anthropometric data were recorded. Student t tests and multiple linear regression models were applied after controlling for potential covariates (ie, age, weight, height, and calcium intake). None of the polymorphisms was associated with the presence of osteoporosis, fractures, and hip BMD. All polymorphisms were associated with unadjusted spine BMD, with the exception of rs4988330. Only rs4988321 was associated with adjusted spine BMD, where the presence of the A allele was associated with significantly lower spine BMD compared with the GG genotype (P=0.002). An interaction of the rs4988321 polymorphism with calcium intake (P=0.016) was found. The carriers of the A allele demonstrated significantly lower spine BMD compared to GG homozygotes (P=0.001) only in the lowest calcium intake group (<680 mg/day), whereas in the highest calcium intake group no differences were found in BMD between genotypes. These findings demonstrate that both rs4988321 polymorphism and its interaction with calcium intake are associated with BMD, whereas higher calcium intake was shown to decrease the negative effect of this polymorphism on BMD.
Journal of the American Dietetic Association 2010;110;7;1078-83
A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.
Max Planck Institutes Tübingen, Tübingen, Germany. firstname.lastname@example.org
Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/.
Funded by: Wellcome Trust: WT077192/Z/05/Z
PLoS computational biology 2010;6;5;e1000770
Genome-wide end-sequenced BAC resources for the NOD/MrkTac() and NOD/ShiLtJ() mouse genomes.
The Wellcome Trust Sanger Institute, Hinxton, UK. email@example.com
Non-obese diabetic (NOD) mice spontaneously develop type 1 diabetes (T1D) due to the progressive loss of insulin-secreting beta-cells by an autoimmune driven process. NOD mice represent a valuable tool for studying the genetics of T1D and for evaluating therapeutic interventions. Here we describe the development and characterization by end-sequencing of bacterial artificial chromosome (BAC) libraries derived from NOD/MrkTac (DIL NOD) and NOD/ShiLtJ (CHORI-29), two commonly used NOD substrains. The DIL NOD library is composed of 196,032 BACs and the CHORI-29 library is composed of 110,976 BACs. The average depth of genome coverage of the DIL NOD library, estimated from mapping the BAC end-sequences to the reference mouse genome sequence, was 7.1-fold across the autosomes and 6.6-fold across the X chromosome. Clones from this library have an average insert size of 150 kb and map to over 95.6% of the reference mouse genome assembly (NCBIm37), covering 98.8% of Ensembl mouse genes. By the same metric, the CHORI-29 library has an average depth over the autosomes of 5.0-fold and 2.8-fold coverage of the X chromosome, the reduced X chromosome coverage being due to the use of a male donor for this library. Clones from this library have an average insert size of 205 kb and map to 93.9% of the reference mouse genome assembly, covering 95.7% of Ensembl genes. We have identified and validated 191,841 single nucleotide polymorphisms (SNPs) for DIL NOD and 114,380 SNPs for CHORI-29. In total we generated 229,736,133 bp of sequence for the DIL NOD and 121,963,211 bp for the CHORI-29. These BAC libraries represent a powerful resource for functional studies, such as gene targeting in NOD embryonic stem (ES) cell lines, and for sequencing and mapping experiments.
Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust: 062023, 077198
Leena Peltonen 1952-2010 Obituary
Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging.
Institute for Molecular Medicine Finland, FIMM, University of Helsinki, FI-00014 Helsinki, Finland.
The combining of genome-wide association (GWA) data across populations represents a major challenge for massive global meta-analyses. Genotype imputation using densely genotyped reference samples facilitates the combination of data across different genotyping platforms. HapMap data is typically used as a reference for single nucleotide polymorphism (SNP) imputation and tagging copy number polymorphisms (CNPs). However, the advantage of having population-specific reference panels for founder populations has not been evaluated. We looked at the properties and impact of adding 81 individuals from a founder population to HapMap3 reference data on imputation quality, CNP tagging, and power to detect association in simulations and in an independent cohort of 2138 individuals. The gain in SNP imputation accuracy was highest among low-frequency markers (minor allele frequency [MAF] < 5%), for which adding the population-specific samples to the reference set increased the median R(2) between imputed and genotyped SNPs from 0.90 to 0.94. Accuracy also increased in regions with high recombination rates. Similarly, a reference set with population-specific extension facilitated the identification of better tag-SNPs for a subset of CNPs; for 4% of CNPs the R(2) between SNP genotypes and CNP intensity in the independent population cohort was at least twice as high as without the extension. We conclude that even a relatively small population-specific reference set yields considerable benefits in SNP imputation, CNP tagging accuracy, and the power to detect associations in founder populations and population isolates in particular.
Funded by: Wellcome Trust: WT089061/Z/09/Z, WT089062/Z/09/Z
Genome research 2010;20;10;1344-51
Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally.
Health Protection Agency Malaria Reference Laboratory, Immunology Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom. firstname.lastname@example.org
Background: Malaria in humans is caused by apicomplexan parasites belonging to 5 species of the genus Plasmodium. Infections with Plasmodium ovale are widely distributed but rarely investigated, and the resulting burden of disease is not known. Dimorphism in defined genes has led to P. ovale parasites being divided into classic and variant types. We hypothesized that these dimorphs represent distinct parasite species.
Methods: Multilocus sequence analysis of 6 genetic characters was carried out among 55 isolates from 12 African and 3 Asia-Pacific countries.
Results: Each genetic character displayed complete dimorphism and segregated perfectly between the 2 types. Both types were identified in samples from Ghana, Nigeria, São Tomé, Sierra Leone, and Uganda and have been described previously in Myanmar. Splitting of the 2 lineages is estimated to have occurred between 1.0 and 3.5 million years ago in hominid hosts.
Conclusions: We propose that P. ovale comprises 2 nonrecombining species that are sympatric in Africa and Asia. We speculate on possible scenarios that could have led to this speciation. Furthermore, the relatively high frequency of imported cases of symptomatic P. ovale infection in the United Kingdom suggests that the morbidity caused by ovale malaria has been underestimated.
Funded by: NIGMS NIH HHS: R01 GM080586; Wellcome Trust: 093956
The Journal of infectious diseases 2010;201;10;1544-50
Common variants in the ATP2B1 gene are associated with susceptibility to hypertension: the Japanese Millennium Genome Project.
Department of Basic Medical Research and Education, Ehime University Graduate School of Medicine, Toon-City, Ehime, Japan. email@example.com
Hypertension is one of the most common complex genetic disorders. We have described previously 38 single nucleotide polymorphisms (SNPs) with suggestive association with hypertension in Japanese individuals. In this study we extend our previous findings by analyzing a large sample of Japanese individuals (n=14 105) for the most associated SNPs. We also conducted replication analyses in Japanese of susceptibility loci for hypertension identified recently from genome-wide association studies of European ancestries. Association analysis revealed significant association of the ATP2B1 rs2070759 polymorphism with hypertension (P=5.3×10(-5); allelic odds ratio: 1.17 [95% CI: 1.09 to 1.26]). Additional SNPs in ATP2B1 were subsequently genotyped, and the most significant association was with rs11105378 (odds ratio: 1.31 [95% CI: 1.21 to 1.42]; P=4.1×10(-11)). Association of rs11105378 with hypertension was cross-validated by replication analysis with the Global Blood Pressure Genetics consortium data set (odds ratio: 1.13 [95% CI: 1.05 to 1.21]; P=5.9×10(-4)). Mean adjusted systolic blood pressure was highly significantly associated with the same SNP in a meta-analysis with individuals of European descent (P=1.4×10(-18)). ATP2B1 mRNA expression levels in umbilical artery smooth muscle cells were found to be significantly different among rs11105378 genotypes. Seven SNPs discovered in published genome-wide association studies were also genotyped in the Japanese population. In the combined analysis with replicated 3 genes, FGF5 rs1458038, CYP17A1, rs1004467, and CSK rs1378942, odds ratio of the highest risk group was 2.27 (95% CI: 1.65 to 3.12; P=4.6×10(-7)) compared with the lower risk group. In summary, this study confirmed common genetic variation in ATP2B1, as well as FGF5, CYP17A1, and CSK, to be associated with blood pressure levels and risk of hypertension.
Funded by: Medical Research Council: G0400874, G0401527, G0801056, MC_U105630924, MC_UP_A100_1003
Rec8-containing cohesin maintains bivalents without turnover during the growing phase of mouse oocytes.
Department of Biochemistry, University of Oxford, Oxford, United Kingdom.
During female meiosis, bivalent chromosomes are thought to be held together from birth until ovulation by sister chromatid cohesion mediated by cohesin complexes whose ring structure depends on kleisin subunits, either Rec8 or Scc1. Because cohesion is established at DNA replication in the embryo, its maintenance for such a long time may require cohesin turnover. To address whether Rec8- or Scc1-containing cohesin holds bivalents together and whether it turns over, we created mice whose kleisin subunits can be cleaved by TEV protease. We show by microinjection experiments and confocal live-cell imaging that Rec8 cleavage triggers chiasmata resolution during meiosis I and sister centromere disjunction during meiosis II, while Scc1 cleavage triggers sister chromatid disjunction in the first embryonic mitosis, demonstrating a dramatic transition from Rec8- to Scc1-containing cohesin at fertilization. Crucially, activation of an ectopic Rec8 transgene during the growing phase of Rec8(TEV)(/TEV) oocytes does not prevent TEV-mediated bivalent destruction, implying little or no cohesin turnover for ≥2 wk during oocyte growth. We suggest that the inability of oocytes to regenerate cohesion may contribute to age-related meiosis I errors.
Funded by: Cancer Research UK; Medical Research Council: G0701161, G0901046; Wellcome Trust
Genes & development 2010;24;22;2505-16
Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.
Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.
Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.
Funded by: Medical Research Council: G0800784; Wellcome Trust
Genes & development 2010;24;24;2772-7
Methodological challenges of genome-wide association analysis in Africa.
Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
Medical research in Africa has yet to benefit from the advent of genome-wide association (GWA) analysis, partly because the genotyping tools and statistical methods that have been developed for European and Asian populations struggle to deal with the high levels of genome diversity and population structure in Africa. However, the haplotypic diversity of African populations might help to overcome one of the major roadblocks in GWA research, the fine mapping of causal variants. We review the methodological challenges and consider how GWA studies in Africa will be transformed by new approaches in statistical imputation and large-scale genome sequencing.
Funded by: Medical Research Council: G0600230, G0600718, G19/9; Wellcome Trust: 077383, 082370
Nature reviews. Genetics 2010;11;2;149-60
Biological, clinical and population relevance of 95 loci for blood lipids.
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.
Funded by: British Heart Foundation: PG/02/128, PG/08/094, PG/08/094/26019, RG/07/005/23633, SP/08/005/25115; Chief Scientist Office: CZB/4/710; FIC NIH HHS: TW05596; Intramural NIH HHS; Medical Research Council: G0000934, G0401527, G0601966, G0700931, G0701863, G0801056, G0801566, G9521010, G9521010D, MC_QA137934, MC_U106179471, MC_U106188470, MC_U127561128; NCI NIH HHS: CA 047988; NCRR NIH HHS: M01-RR00425, RR20649, U54 RR020278, UL1RR025005; NHGRI NIH HHS: 1Z01 HG000024, N01-HG-65403, T32 HG00040, U01HG004402; NHLBI NIH HHS: 5R01HL087679-02, 5R01HL08770003, 5R01HL08821502, HL 04381, HL 080467, HL-54776, HL085144, K99 HL098364, K99 HL098364-01, K99HL094535, N01 HC-15103, N01 HC-55222, N01-HC-25195, N01-HC-35129, N01-HC-45133, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N02-HL-6-4278, R01 HL087647, R01 HL087676, R01 HL089650, R01HL086694, R01HL087641, R01HL087652, R01HL59367, RC1 HL099634, RC1 HL099634-02, RC1 HL099793, RC2 HL101864,, RC2 HL102419, T32HL007208, U01 HL069757, U01 HL080295; NIA NIH HHS: N01-AG-12100; NICHD NIH HHS: R24 HD050924; NIDDK NIH HHS: 5R01DK06833603, 5R01DK07568102, DK062370, DK063491, DK072193, DK078150, DK56350, R01 DK072193, R01 DK078150, U01 DK062370, U01 DK062418; NIEHS NIH HHS: ES10126; NIGMS NIH HHS: T32 GM007092; PHS HHS: HHSN268200625226C; Wellcome Trust: 068545/Z/02, 076113/B/04/Z, 077016/Z/05/Z, 079895
The systematic functional analysis of Plasmodium protein kinases identifies essential regulators of mosquito transmission.
Institute of Genetics, QMC, University of Nottingham, Nottingham NG7 2UH, UK. firstname.lastname@example.org
Although eukaryotic protein kinases (ePKs) contribute to many cellular processes, only three Plasmodium falciparum ePKs have thus far been identified as essential for parasite asexual blood stage development. To identify pathways essential for parasite transmission between their mammalian host and mosquito vector, we undertook a systematic functional analysis of ePKs in the genetically tractable rodent parasite Plasmodium berghei. Modeling domain signatures of conventional ePKs identified 66 putative Plasmodium ePKs. Kinomes are highly conserved between Plasmodium species. Using reverse genetics, we show that 23 ePKs are redundant for asexual erythrocytic parasite development in mice. Phenotyping mutants at four life cycle stages in Anopheles stephensi mosquitoes revealed functional clusters of kinases required for sexual development and sporogony. Roles for a putative SR protein kinase (SRPK) in microgamete formation, a conserved regulator of clathrin uncoating (GAK) in ookinete formation, and a likely regulator of energy metabolism (SNF1/KIN) in sporozoite development were identified.
Funded by: Medical Research Council: G0501670, G0900109; Wellcome Trust: 087656, WT078335MA, WT089085/Z/09/Z
Cell host & microbe 2010;8;4;377-87
An adaptable two-color flow cytometric assay to quantitate the invasion of erythrocytes by Plasmodium falciparum parasites.
Sanger Malaria Programme, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom.
Plasmodium falciparum genotyping has recently undergone a revolution, and genome-wide genotype datasets are now being collected for large numbers of parasite isolates. By contrast, phenotyping technologies have lagged behind, with few high throughput phenotyping platforms available. Invasion of human erythrocytes by Plasmodium falciparum is a phenotype of particular interest because of its central role in parasite development. Invasion is a variable phenotype influenced by natural genetic variation in both the parasite and host and is governed by multiple overlapping and in some instances redundant parasite-erythrocyte interactions. To facilitate the scale-up of erythrocyte invasion phenotyping, we have developed a novel platform based on two-color flow cytometry that distinguishes parasite invasion from parasite growth. Target cells that had one or more receptors removed using enzymatic treatment were prelabeled with intracellular dyes CFDA-SE or DDAO-SE, incubated with P. falciparum parasites, and parasites that had invaded either labeled or unlabeled cells were detected with fluorescent DNA-intercalating dyes Hoechst 33342 or SYBR Green I. Neither cell label interfered with erythrocyte invasion, and the combination of cell and parasite dyes recapitulated known invasion phenotypes for three standard laboratory strains. Three different dye combinations with minimal overlap have been validated, meaning the same assay can be adapted to instruments harboring several different combinations of laser lines. The assay is sensitive, operates in a 96-well format, and can be used to quantitate the impact of natural or experimental genetic variation on erythrocyte invasion efficiency.
Funded by: Wellcome Trust
Cytometry. Part A : the journal of the International Society for Analytical Cytology 2010;77;11;1067-74
De novo apparently balanced translocations in man are predominantly paternal in origin and associated with a significant increase in paternal age.
Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury SP2 8TE, UK. email@example.com
Background: Congenital chromosome abnormalities are relatively common in our species and among structural abnormalities the most common class is balanced reciprocal translocations. Determining the parental origin of de novo balanced translocations may provide insights into how and when they arise. While there is a general paternal bias in the origin of non-recurrent unbalanced rearrangements, there are few data on parental origin of non-recurrent balanced rearrangements.
Methods: The parental origin of a series of de novo balanced reciprocal translocations was determined using DNA from flow sorted derivative chromosomes and linkage analysis.
Results: Of 27 translocations, we found 26 to be of paternal origin and only one of maternal origin. We also found the paternally derived translocations to be associated with a significantly increased paternal age (p<0.008).
Conclusion: Our results suggest there is a very pronounced paternal bias in the origin of all non-recurrent reciprocal translocations and that they may arise during one of the numerous mitotic divisions that occur in the spermatogonial germ cells prior to meiosis.
Funded by: Wellcome Trust: WT077008
Journal of medical genetics 2010;47;2;112-5
CpG islands influence chromatin structure via the CpG-binding protein Cfp1.
Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK.
CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.
Funded by: Cancer Research UK; Medical Research Council: G0800026; Wellcome Trust: 079643, 091580, 098051
A visual migraine aura locus maps to 9q21-q22.
Folkhälsan Research Center, Biomedicum Helsinki, PO Box 63, 00014 University of Helsinki, Finland. firstname.lastname@example.org
Objective: To identify susceptibility loci for visual migraine aura in migraine families primarily affected with scintillating scotoma type of aura.
Methods: We included Finnish migraine families with at least 2 affected family members with scintillating scotoma as defined by the International Criteria for Headache Disorders-II. A total of 36 multigenerational families containing 351 individuals were included, 185 of whom have visual aura and 159 have scintillating scotoma. Parametric and nonparametric linkage analyses were performed with 378 microsatellite markers. The most promising linkage loci found were fine-mapped with additional microsatellite markers.
Results: A novel locus on chromosome 9q22-q31 for migraine aura was identified (HLOD = 4.7 at 104 cM). Fine-mapping identified a shared haplotype segment of 12 cM (9.8 Mb) on 9q21-q22 among the aura affected. Four other loci showed linkage to aura: a locus on 12p13 showed significant evidence of linkage, and suggestive evidence of linkage was detected to loci on chromosomes 5q13, 6q25, and 13q14.
Conclusions: A novel visual migraine aura locus has been mapped to chromosome 9q21-q22. Interestingly, this region has previously been linked to occipitotemporal lobe epilepsy with prominent visual symptoms. Our finding further supports a shared genetic background in migraine and epilepsy and suggests that susceptibility variant(s) to visual aura for both of these traits are located in the 9q21-q22 locus.
Funded by: NIGMS NIH HHS: GM053275
Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.
Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data.
Funded by: Wellcome Trust: WT 085775/Z/08/Z
Genome biology 2010;11;4;R41
Variants near DMRT1, TERT and ATF7IP are associated with testicular germ cell cancer.
Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey, UK. firstname.lastname@example.org
We conducted a genome-wide association study for testicular germ cell tumor, genotyping 298,782 SNPs in 979 affected individuals and 4,947 controls from the UK and replicating associations in a further 664 cases and 3,456 controls. We identified three new susceptibility loci, two of which include genes that are involved in telomere regulation. We identified two independent signals within the TERT-CLPTM1L locus on chromosome 5, which has previously been associated with multiple other cancers (rs4635969, OR=1.54, P=1.14x10(-23); rs2736100, OR=1.33, P=7.55x10(-15)). We also identified a locus on chromosome 12 (rs2900333, OR=1.27, P=6.16x10(-10)) that contains ATF7IP, a regulator of TERT expression. Finally, we identified a locus on chromosome 9 (rs755383, OR=1.37, P=1.12x10(-23)), containing the sex determination gene DMRT1, which has been linked to teratoma susceptibility in mice.
Funded by: Cancer Research UK; Department of Health; Medical Research Council: G0000934; Wellcome Trust: 068545/Z/02
Nature genetics 2010;42;7;604-7
Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a.
Division of Child and Adolescent Psychiatry and Child Development, Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 1201 Welch Road, Stanford, CA 94304-5485, USA. email@example.com
Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.
European journal of human genetics : EJHG 2010;18;4;479-84
The Swedish new variant of Chlamydia trachomatis: genome sequence, morphology, cell tropism and phenotypic characterization.
National Reference Laboratory for Pathogenic Neisseria, Department of Laboratory Medicine, Orebro University Hospital, Orebro, Sweden. firstname.lastname@example.org
Chlamydia trachomatis is a major cause of bacterial sexually transmitted infections worldwide. In 2006, a new variant of C. trachomatis (nvCT), carrying a 377 bp deletion within the plasmid, was reported in Sweden. This deletion included the targets used by the commercial diagnostic systems from Roche and Abbott. The nvCT is clonal (serovar/genovar E) and it spread rapidly in Sweden, undiagnosed by these systems. The degree of spread may also indicate an increased biological fitness of nvCT. The aims of this study were to describe the genome of nvCT, to compare the nvCT genome to all available C. trachomatis genome sequences and to investigate the biological properties of nvCT. An early nvCT isolate (Sweden2) was analysed by genome sequencing, growth kinetics, microscopy, cell tropism assay and antimicrobial susceptibility testing. It was compared with relevant C. trachomatis isolates, including a similar serovar E C. trachomatis wild-type strain that circulated in Sweden prior to the initially undetected expansion of nvCT. The nvCT genome does not contain any major genetic polymorphisms - the genes for central metabolism, development cycle and virulence are conserved - or phenotypic characteristics that indicate any altered biological fitness. This is supported by the observations that the nvCT and wild-type C. trachomatis infections are very similar in terms of epidemiological distribution, and that differences in clinical signs are only described, in one study, in women. In conclusion, the nvCT does not appear to have any altered biological fitness. Therefore, the rapid transmission of nvCT in Sweden was due to the strong diagnostic selective advantage and its introduction into a high-frequency transmitting population.
Funded by: Wellcome Trust: 080348
Microbiology (Reading, England) 2010;156;Pt 5;1394-404
Chemokine ligand 2 genetic variants, serum monocyte chemoattractant protein-1 levels, and the risk of coronary artery disease.
Department of Vascular Medicine, Academic Medical Center, Amsterdam, the Netherlands. email@example.com
Objective: In humans, evidence about the association between levels of monocyte chemoattractant protein-1 (MCP-1), its coding gene chemokine (C-C motif) ligand 2 (CCL2), and risk of coronary artery disease (CAD) is contradictory.
We performed a nested case-control study in the prospective EPIC-Norfolk cohort investigating the relationship between CCL2 single-nucleotide polymorphisms (SNPs), MCP-1 concentrations, and the risk of future CAD. Cases (n=1138) were apparently healthy men and women aged 45 to 79 years who developed fatal or nonfatal CAD during a mean follow-up of 6 years. Controls (n=2237) were matched by age, sex, and enrollment time. Using linear regression analysis no association between CCL2 SNPs and MCP-1 serum concentrations became apparent, nor did we find a significant association between MCP-1 serum levels and risk of future CAD. Finally, Cox regression analysis showed no significant association between CCL2 SNPs and the future CAD risk. In addition, we did not find any robust associations between the CCL2 haplotypes and MCP-1 serum concentration or future CAD risk.
Conclusions: Our data do not support previous publications indicating that MCP-1 is involved in the pathogenesis of CAD.
Funded by: Cancer Research UK; Medical Research Council
Arteriosclerosis, thrombosis, and vascular biology 2010;30;7;1460-6
Somatic structural rearrangements in genetically engineered mouse mammary tumors.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, UK. firstname.lastname@example.org
Background: Here we present the first paired-end sequencing of tumors from genetically engineered mouse models of cancer to determine how faithfully these models recapitulate the landscape of somatic rearrangements found in human tumors. These were models of Trp53-mutated breast cancer, Brca1- and Brca2-associated hereditary breast cancer, and E-cadherin (Cdh1) mutated lobular breast cancer.
Results: We show that although Brca1- and Brca2-deficient mouse mammary tumors have a defect in the homologous recombination pathway, there is no apparent difference in the type or frequency of somatic rearrangements found in these cancers when compared to other mouse mammary cancers, and tumors from all genetic backgrounds showed evidence of microhomology-mediated repair and non-homologous end-joining processes. Importantly, mouse mammary tumors were found to carry fewer structural rearrangements than human mammary cancers and expressed in-frame fusion genes. Like the fusion genes found in human mammary tumors, these were not recurrent. One mouse tumor was found to contain an internal deletion of exons of the Lrp1b gene, which led to a smaller in-frame transcript. We found internal in-frame deletions in the human ortholog of this gene in a significant number (4.2%) of human cancer cell lines.
Conclusions: Paired-end sequencing of mouse mammary tumors revealed that they display significant heterogeneity in their profiles of somatic rearrangement but, importantly, fewer rearrangements than cognate human mammary tumors, probably because these cancers have been induced by strong driver mutations engineered into the mouse genome. Both human and mouse mammary cancers carry expressed fusion genes and conserved homozygous deletions.
Funded by: Cancer Research UK; Medical Research Council: G0800024; Wellcome Trust: 088340, 093867
Genome biology 2010;11;10;R100
The use of DNA transposons for cancer gene discovery in mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Insertional mutagenesis in mice is a potent instrument for cancer gene discovery. Until recently, retroviruses were the main experimental tools in this field and application of insertional mutagenesis was limited to tissues for which these agents have tropism, namely hemopoietic cells and mammary epithelium. However, the field has been revolutionized and greatly expanded with the recent reanimation of the transposons, a highly flexible group of insertional mutagens first discovered in maize, which have now been adapted for use in mammalian cells. Transposons do not only extend the application of insertional mutagenesis to any tissue of choice, but also allow a more extensive and unbiased coverage of the genome, can be designed to selectively activate or inactivate genes, and are highly amenable to temporal and spatial control. This chapter gives an overview of the design and application of transposons to cancer gene discovery in mice.
Methods in enzymology 2010;477;91-106
The Molecular Basis of Leukaemia and Lymphoma
Postgraduate Haematology 2010;Chapter 21;380-94
Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis.
Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P<5x10(-8). These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
Funded by: Chief Scientist Office: CZB/4/710; Department of Health: DHCS/07/07/008; Medical Research Council: G0600331, G0601261, G0700222, G0700222(81696), G0701863, MC_U106179471, MC_U106179474, MC_U127592696; NCRR NIH HHS: UL1RR025005; NHGRI NIH HHS: 1 Z01 HG000024, U01HG004171, U01HG004399, U01HG004402; NHLBI NIH HHS: 1K99HL094535-01A1, N01-HC-25195, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N02-HL-6-4278, R01HL086694, R01HL087641, R01HL59367; NIAMS NIH HHS: 1K08AR055688, K08 AR055688, K08 AR055688-03; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK069922, DK072193, DK073490, DK078616, DK58845, K23-DK65978, K24-DK080140, R01 DK029867, R01 DK072193; PHS HHS: HHSN268200625226C; Wellcome Trust: 064890, 072960, 075491, 076113, 077016, 079557, 081682, 083270, 086596, 088885, 090532
Nature genetics 2010;42;7;579-89
Comparison of two DNA microarrays for detection of plasmid-mediated antimicrobial resistance and virulence factor genes in clinical isolates of Enterobacteriaceae and non-Enterobacteriaceae.
Department of Clinical Microbiology, Sir Patrick Dun Translational Research Laboratory, School of Medicine, University of Dublin, Trinity College, St James's Hospital Campus, Dublin 8, Ireland. email@example.com
A DNA microarray was developed to detect plasmid-mediated antimicrobial resistance (AR) and virulence factor (VF) genes in clinical isolates of Enterobacteriaceae and non-Enterobacteriaceae. The array was validated with the following bacterial species: Escherichiacoli (n=17); Klebsiellapneumoniae (n=3); Enterobacter spp. (n=6); Acinetobacter genospecies 3 (n=1); Acinetobacterbaumannii (n=1); Pseudomonasaeruginosa (n=2); and Stenotrophomonasmaltophilia (n=2). The AR gene profiles of these isolates were identified by polymerase chain reaction (PCR). The DNA microarray consisted of 155 and 133 AR and VF gene probes, respectively. Results were compared with the commercially available Identibac AMR-ve Array Tube. Hybridisation results indicated that there was excellent correlation between PCR and array results for AR and VF genes. Genes conferring resistance to each antibiotic class were identified by the DNA array. Unusual resistance genes were also identified, such as bla(SHV-5) in a bla(OXA-23)-positive carbapenem-resistant A. baumannii. The phylogenetic group of each E. coli isolate was verified by the array. These data demonstrate that it is possible to screen simultaneously for all important classes of mobile AR and VF genes in Enterobacteriaceae and non-Enterobacteriaceae whilst also assigning a correct phylogenetic group to E. coli isolates. Therefore, it is feasible to test clinical Gram-negative bacteria for all known AR genes and to provide important information regarding pathogenicity simultaneously.
Funded by: Medical Research Council
International journal of antimicrobial agents 2010;35;6;593-8
The genome of a songbird.
The Genome Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA. firstname.lastname@example.org
The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D013704/1, BB/E010652/1, BB/F007590/1, BBE0175091, BBS/E/I/00001425; Howard Hughes Medical Institute; Medical Research Council: MC_U137761446; NHGRI NIH HHS: R01 HG002939, U54 HG003079; NIDA NIH HHS: P30 DA018310; NIDCD NIH HHS: R01 DC007218; NIGMS NIH HHS: R01 GM059290, R01 GM085233, R01 GM59290; NINDS NIH HHS: R01 NS045264, R01NS051820
Genetic variants influencing circulating lipid levels and risk of coronary artery disease.
Genetics Division, GlaxoSmithKline R&D, King of Prussia, PA, USA.
Objective: Genetic studies might provide new insights into the biological mechanisms underlying lipid metabolism and risk of CAD. We therefore conducted a genome-wide association study to identify novel genetic determinants of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides.
Methods and results: We combined genome-wide association data from 8 studies, comprising up to 17 723 participants with information on circulating lipid concentrations. We did independent replication studies in up to 37 774 participants from 8 populations and also in a population of Indian Asian descent. We also assessed the association between single-nucleotide polymorphisms (SNPs) at lipid loci and risk of CAD in up to 9 633 cases and 38 684 controls. We identified 4 novel genetic loci that showed reproducible associations with lipids (probability values, 1.6×10(-8) to 3.1×10(-10)). These include a potentially functional SNP in the SLC39A8 gene for HDL-C, an SNP near the MYLIP/GMPR and PPP1R3B genes for LDL-C, and at the AFF1 gene for triglycerides. SNPs showing strong statistical association with 1 or more lipid traits at the CELSR2, APOB, APOE-C1-C4-C2 cluster, LPL, ZNF259-APOA5-A4-C3-A1 cluster and TRIB1 loci were also associated with CAD risk (probability values, 1.1×10(-3) to 1.2×10(-9)).
Conclusions: We have identified 4 novel loci associated with circulating lipids. We also show that in addition to those that are largely associated with LDL-C, genetic loci mainly associated with circulating triglycerides and HDL-C are also associated with risk of CAD. These findings potentially provide new insights into the biological mechanisms underlying lipid metabolism and CAD risk.
Funded by: British Heart Foundation: PG/08/094, PG/08/094/26019; Medical Research Council: G0000934, G0401527, G0500539, G0601966, G0700931, G0701863, G0801566, MC_QA137934, MC_U105630924, MC_U106179471, MC_U106188470; NHLBI NIH HHS: 5R01HL087679-02; NIDDK NIH HHS: R01 DK062370, U01 DK062418; NIMH NIH HHS: 1RL1MH083268-01; Wellcome Trust: 068545/Z/02, 077016/Z/05/Z, 079895, GR069224
Arteriosclerosis, thrombosis, and vascular biology 2010;30;11;2264-76
Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.
Funded by: Arthritis Research UK: 17552, 18475; British Heart Foundation: RG/09/012/28096; Chief Scientist Office: CZB/4/540, ETM/137, ETM/75; Medical Research Council: G0000934, G0400874, G0500115, G0501942, G0600329, G0600705, G0700491, G0701003, G0701420, G0701810, G0701810(85517), G0800383, G0800509, G0800759, G19/9, G90/106, G9521010, MC_UP_A390_1107; Wellcome Trust: 061858, 083948, 089989, 090532
Distinct variants at LIN28B influence growth in height from birth to adulthood.
Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland. email@example.com
We have studied the largely unknown genetic underpinnings of height growth by using a unique resource of longitudinal childhood height data available in Finnish population cohorts. After applying GWAS mapping of potential genes influencing pubertal height growth followed by further characterization of the genetic effects on complete postnatal growth trajectories, we have identified strong association between variants near LIN28B and pubertal growth (rs7759938; female p = 4.0 x 10(-9), male p = 1.5 x 10(-4), combined p = 5.0 x 10(-11), n = 5038). Analysis of growth during early puberty confirmed an effect on the timing of the growth spurt. Correlated SNPs have previously been implicated as influencing both adult stature and age at menarche, the same alleles associating with taller height and later age of menarche in other studies as with later pubertal growth here. Additionally, a partially correlated LIN28B SNP, rs314277, has been associated previously with final height. Testing both rs7759938 and rs314277 (pairwise r(2) = 0.29) for independent effects on postnatal growth in 8903 subjects indicated that the pubertal timing-associated marker rs7759938 affects prepubertal growth in females (p = 7 x 10(-5)) and final height in males (p = 5 x 10(-4)), whereas rs314277 has sex-specific effects on growth (p for interaction = 0.005) that were distinct from those observed at rs7759938. In conclusion, partially correlated variants at LIN28B tag distinctive, complex, and sex-specific height-growth-regulating effects, influencing the entire period of postnatal growth. These findings imply a critical role for LIN28B in the regulation of human growth.
Funded by: Medical Research Council: G0500539; Wellcome Trust: 89061/Z/09/Z, WT089062
American journal of human genetics 2010;86;5;773-82
The activating mutation R201C in GNAS promotes intestinal tumourigenesis in Apc(Min/+) mice through activation of Wnt and ERK1/2 MAPK pathways.
Experimental Cancer Genetics, Wellcome Trust, Sanger Institute, Hinxton, UK.
Somatically acquired, activating mutations of GNAS, the gene encoding the stimulatory G-protein Gsalpha subunit, have been identified in kidney, thyroid, pituitary, leydig cell, adrenocortical and, more recently, in colorectal tumours, suggesting that mutations such as R201C may be oncogenic in these tissues. To study the role of GNAS in intestinal tumourigenesis, we placed GNAS R201C under the control of the A33-antigen promoter (Gpa33), which is almost exclusively expressed in the intestines. The GNAS R201C mutation has been shown to result in the constitutive activation of Gsalpha and adenylate cyclase and to lead to the autonomous synthesis of cyclic adenosine monophosphate (cAMP). Gpa33(tm1(GnasR201C)Wtsi/+) mice showed significantly elevated cAMP levels and a compensatory upregulation of cAMP-specific phosphodiesterases in the intestinal epithelium. GNAS R201C alone was not sufficient to induce tumourigenesis by 12 months, but there was a significant increase in adenoma formation when Gpa33(tm1(GnasR201C)Wtsi/+) mice were bred onto an Apc(Min/+) background. GNAS R201C expression was associated with elevated expression of Wnt and extracellular signal-regulated kinase 1/2 mitogen-activated protein kinase (ERK1/2 MAPK) pathway target genes, increased phosphorylation of ERK1/2 MAPK and increased immunostaining for the proliferation marker Ki67. Furthermore, the effects of GNAS R201C on the Wnt pathway were additive to the inactivation of Apc. Our data strongly suggest that activating mutations of GNAS cooperate with inactivation of APC and are likely to contribute to colorectal tumourigenesis.
Funded by: Cancer Research UK: A6997, A8784; Wellcome Trust: 082356
Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.
Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK. firstname.lastname@example.org
The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.
Funded by: Cancer Research UK; Wellcome Trust
Nucleic acids research 2010;38;14;e151
Plasmodium falciparum ATP6 not under selection during introduction of artemisinin combination therapy in Peru.
Antimicrobial agents and chemotherapy 2010;54;5;2280; author reply 2280-1
Commercially available outbred mice for genome-wide association studies.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes.
Funded by: Medical Research Council: G0800024; Wellcome Trust: 079912
PLoS genetics 2010;6;9;e1001085
Racial/ethnic differences in association of fasting glucose-associated genomic loci with fasting glucose, HOMA-B, and impaired fasting glucose in the U.S. adult population.
Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, Georgia, USA. email@example.com
Objective: To estimate allele frequencies and the marginal and combined effects of novel fasting glucose (FG)-associated single nucleotide polymorphisms (SNPs) on FG levels and on risk of impaired FG (IFG) among non-Hispanic white, non-Hispanic black, and Mexican Americans.
Research design and methods: DNA samples from 3,024 adult fasting participants in the National Health and Nutrition Examination Survey (NHANES) III (1991-1994) were genotyped for 16 novel FG-associated SNPs in multiple genes. We determined the allele frequencies and influence of these SNPs alone and in a weighted genetic risk score on FG, homeostasis model assessment of β-cell function (HOMA-B), and IFG by race/ethnicity, while adjusting for age and sex.
Results: All allele frequencies varied significantly by race/ethnicity. A weighted genetic risk score, based on 16 SNPs, was associated with a 0.022 mmol/l (95% CI 0.009-0.035), 0.036 mmol/l (0.019-0.052), and 0.033 mmol/l (0.020-0.046) increase in FG levels per risk allele among non-Hispanic whites, non-Hispanic blacks, and Mexican Americans, respectively. Adjusted odds ratios for IFG were 1.78 for non-Hispanic whites (95% CI 1.00-3.17), 2.40 for non-Hispanic blacks (1.07-5.37), and 2.39 for Mexican Americans (1.37-4.14) when we compared the highest with the lowest quintiles of genetic risk score (P=0.365 for testing heterogeneity of effect across race/ethnicity).
Conclusions: We conclude that allele frequencies of 16 novel FG-associated SNPs vary significantly by race/ethnicity, but the influence of these SNPs on FG levels, HOMA-B, and IFG were generally consistent across all racial/ethnic groups.
Funded by: NIDDK NIH HHS: K23 DK65978, K24 DK080140, R01 DK078616
Diabetes care 2010;33;11;2370-7
Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.
Unlabelled: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.
Funded by: Wellcome Trust
Bioinformatics (Oxford, England) 2010;26;19;2474-6
Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. firstname.lastname@example.org
Background: Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.
Results: We have developed a pipeline to detect human unitary pseudogenes through analyzing the global inventory of orthologs between the human genome and its mammalian relatives. We focus on gene losses along the human lineage after the divergence from rodents about 75 million years ago. In total, we identify 76 unitary pseudogenes, including previously annotated ones, and many novel ones. By comparing each of these to its functioning ortholog in other mammals, we can approximately date the creation of each unitary pseudogene (that is, the gene 'death date') and show that for our group of 76, the functional genes appear to be disabled at a fairly uniform rate throughout primate evolution - not all at once, correlated, for instance, with the 'Alu burst'. Furthermore, we identify 11 unitary pseudogenes that are polymorphic - that is, they have both nonfunctional and functional alleles currently segregating in the human population. Comparing them with their orthologs in other primates, we find that two of them are in fact pseudogenes in non-human primates, suggesting that they represent cases of a gene being resurrected in the human lineage.
Conclusions: This analysis of unitary pseudogenes provides insights into the evolutionary constraints faced by different organisms and the timescales of functional gene loss in humans.
Funded by: NHGRI NIH HHS: U54 HG004555; NLM NIH HHS: 1K99LM009770-01; Wellcome Trust: 077198
Genome biology 2010;11;3;R26