Sanger Institute - Publications 2008
Number of papers published in 2008: 142
EYS, encoding an ortholog of Drosophila spacemaker, is mutated in autosomal recessive retinitis pigmentosa.
Department of Molecular Genetics, Institute of Ophthalmology, London EC1V 9EL, UK.
Using a positional cloning approach supported by comparative genomics, we have identified a previously unreported gene, EYS, at the RP25 locus on chromosome 6q12 commonly mutated in autosomal recessive retinitis pigmentosa. Spanning over 2 Mb, this is the largest eye-specific gene identified so far. EYS is independently disrupted in four other mammalian lineages, including that of rodents, but is well conserved from Drosophila to man and is likely to have a role in the modeling of retinal architecture.
Funded by: Medical Research Council: MC_U137761446; Wellcome Trust: 077008
Nature genetics 2008;40;11;1285-7
Contemporary approaches for modifying the mouse genome.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
The mouse is a premiere experimental organism that has contributed significantly to our understanding of vertebrate biology. Manipulation of the mouse genome via embryonic stem (ES) cell technology makes it possible to engineer an almost limitless repertoire of mutations to model human disease and assess gene function. In this review we outline recent advances in mouse experimental genetics and provide a "how-to" guide for those people wishing to access this technology. We also discuss new technologies, such as transposon-mediated mutagenesis, and resources of targeting vectors and ES cells, which are likely to dramatically accelerate the pace with which we can assess gene function in vivo, and the progress of forward and reverse genetic screens in mice.
Funded by: Cancer Research UK; Wellcome Trust
Physiological genomics 2008;34;3;225-38
Genomic-scale prioritization of drug targets: the TDR Targets database.
Instituto de Investigaciones Biotecnológicas, Universidad Nacional de General San Martín, San Martín 1650, Buenos Aires, Argentina. email@example.com
The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.
Funded by: NIGMS NIH HHS: R01 GM054762
Nature reviews. Drug discovery 2008;7;11;900-7
Data growth and its impact on the SCOP database: new developments.
MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 0QH, UK.
The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
Funded by: Medical Research Council: G0100305, MC_U105192716; NIGMS NIH HHS: R01 GM073109, R01-GM073109; Wellcome Trust: 077198
Nucleic acids research 2008;36;Database issue;D419-25
Consistently replicating locus linked to migraine on 10q22-q23.
Biomedicum Helsinki, Research Program in Molecular Medicine, University of Helsinki, 00290 Helsinki, Finland.
Here, we present the results of two genome-wide scans in two diverse populations in which a consistent use of recently introduced migraine-phenotyping methods detects and replicates a locus on 10q22-q23, with an additional independent replication. No genetic variants have been convincingly established in migraine, and although several loci have been reported, none of them has been consistently replicated. We employed the three known migraine-phenotyping methods (clinical end diagnosis, latent-class analysis, and trait-component analysis) with robust multiple testing correction in a large sample set of 1675 individuals from 210 migraine families from Finland and Australia. Genome-wide multipoint linkage analysis that used the Kong and Cox exponential model in Finns detected a locus on 10q22-q23 with highly significant evidence of linkage (LOD 7.68 at 103 cM in female-specific analysis). The Australian sample showed a LOD score of 3.50 at the same locus (100 cM), as did the independent Finnish replication study (LOD score 2.41, at 102 cM). In addition, four previously reported loci on 8q21, 14q21, 18q12, and Xp21 were also replicated. A shared-segment analysis of 10q22-q23 linked Finnish families identified a 1.6-9.5 cM segment, centered on 101 cM, which shows in-family homology in 95% of affected Finns. This region was further studied with 1323 SNPs. Although no significant association was observed, four regions warranting follow-up studies were identified. These results support the use of symptomology-based phenotyping in migraine and suggest that the 10q22-q23 locus probably contains one or more migraine susceptibility variants.
Funded by: NCRR NIH HHS: U54 RR020278; NIAAA NIH HHS: AA013320, AA013326, AA014041, AA07728, AA11998, P50 AA011998, R01 AA007535, R01 AA007728, R01 AA010249, R01 AA013320, R01 AA013326, R01 AA014041, R37 AA007728; NINDS NIH HHS: R01 NS037675, R01 NS37675
American journal of human genetics 2008;82;5;1051-63
Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis.
Department of Genetics, University of Leicester, Leicester, United Kingdom.
The human Y chromosome shows frequent structural variants, some of which are selectively neutral, while others cause impaired fertility due to the loss of spermatogenic genes. The large-scale use of multiple Y-chromosomal microsatellites in forensic and population genetic studies can reveal such variants, through the absence or duplication of specific markers in haplotypes. We describe Y chromosomes in apparently normal males carrying null and duplicated alleles at the microsatellite DYS448, which lies in the proximal part of the azoospermia factor c (AZFc) region, important in spermatogenesis, and made up of "ampliconic" repeats that act as substrates for nonallelic homologous recombination (NAHR). Physical mapping in 26 DYS448 deletion chromosomes reveals that only three cases belong to a previously described class, representing independent occurrences of an approximately 1.5-Mb deletion mediated by recombination between the b1 and b3 repeat units. The remainder belong to five novel classes; none appears to be mediated through homologous recombination, and all remove some genes, but are likely to be compatible with normal fertility. A combination of deletion analysis with binary-marker and microsatellite haplotyping shows that the 26 deletions represent nine independent events. Nine DYS448 duplication chromosomes can be explained by four independent events. Some lineages have risen to high frequency in particular populations, in particular a deletion within haplogroup (hg) C(*)(xC3a,C3c) found in 18 Asian males. The nonrandom phylogenetic distribution of duplication and deletion events suggests possible structural predisposition to such mutations in hgs C and G.
Funded by: Wellcome Trust: 057559, 077009
Human mutation 2008;29;10;1171-80
A robust statistical method for case-control association testing with copy number variation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
Funded by: Wellcome Trust: 061860
Nature genetics 2008;40;10;1245-52
Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.
Bioinformatics and Statistical Genetics, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
Several risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further, we combined data from three studies on Crohn's disease (a total of 3,230 cases and 4,829 controls) and carried out replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 additional loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1 and ITLN1. The expanded molecular understanding of the basis of this disease offers promise for informed therapeutic development.
Funded by: Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0600329, G0800759; NIAID NIH HHS: AI06277, R01 AI062773, R01 AI062773-02; NIDDK NIH HHS: DK064869, DK62413, DK62420, DK62422, DK62423, DK62429, DK62431, DK62432, P30 DK040561, P30 DK040561-13, P30 DK063491, P30 DK063491-019004, P30 DK063491-029004, P30 DK063491-039004, P30 DK063491-049004, P30 DK063491-05, R01 DK064869, R01 DK064869-04, U01 DK062413, U01 DK062413-06, U01 DK062420, U01 DK062420-01, U01 DK062420-02, U01 DK062420-03, U01 DK062420-04, U01 DK062420-05, U01 DK062420-06, U01 DK062422, U01 DK062422-07, U01 DK062423, U01 DK062423-06, U01 DK062429, U01 DK062429-07, U01 DK062431, U01 DK062431-06, U01 DK062432; Wellcome Trust: 068545/Z/02
Nature genetics 2008;40;8;955-62
Population-specific risk of type 2 diabetes conferred by HNF4A P2 promoter variants: a lesson for replication studies.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. firstname.lastname@example.org
Objective: Single nucleotide polymorphisms (SNPs) in the P2 promoter region of HNF4A were originally shown to be associated with predisposition for type 2 diabetes in Finnish, Ashkenazi, and, more recently, Scandinavian populations, but they generated conflicting results in additional populations. We aimed to investigate whether data from a large-scale mapping approach would replicate this association in novel Ashkenazi samples and in U.K. populations and whether these data would allow us to refine the association signal.
Research design and methods: Using a dense linkage disequilibrium map of 20q, we selected SNPs from a 10-Mb interval centered on HNF4A. In a staged approach, we first typed 4,608 SNPs in case-control populations from four U.K. populations and an Ashkenazi population (n = 2,516). In phase 2, a subset of 763 SNPs was genotyped in 2,513 additional samples from the same populations.
Results: Combined analysis of both phases demonstrated association between HNF4A P2 SNPs (rs1884613 and rs2144908) and type 2 diabetes in the Ashkenazim (n = 991; P < 1.6 x 10(-6)). Importantly, these associations are significant in a subset of Ashkenazi samples (n = 531) not previously tested for association with P2 SNPs (odds ratio [OR] approximately 1.7; P < 0.002), thus providing replication within the Ashkenazim. In the U.K. populations, this association was not significant (n = 4,022; P > 0.5), and the estimate for the OR was much smaller (OR 1.04; [95%CI 0.91-1.19]).
Conclusions: These data indicate that the risk conferred by HNF4A P2 is significantly different between U.K. and Ashkenazi populations (P < 0.00007), suggesting that the underlying causal variant remains unidentified. Interactions with other genetic or environmental factors may also contribute to this difference in risk between populations.
Funded by: Medical Research Council: MC_U106179471; NIDDK NIH HHS: R01 DK049583; PHS HHS: R01K049583; Wellcome Trust: 076113, 077016, 079557
Re-evaluation of putative rheumatoid arthritis susceptibility genes in the post-genome wide association study era and hypothesis of a key pathway underlying susceptibility.
Arc-Epidemiology Unit, Stopford Building, The University of Manchester, Manchester, UK. email@example.com
Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case-control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.
Funded by: Arthritis Research UK: 17552, 18475; Medical Research Council: G0000934; Wellcome Trust: 090532
Human molecular genetics 2008;17;15;2274-9
Rheumatoid arthritis susceptibility loci at chromosomes 10p15, 12q13 and 22q13.
Arthritis Research Campaign, Epidemiology Unit, The University of Manchester, Manchester, UK. firstname.lastname@example.org
The WTCCC study identified 49 SNPs putatively associated with rheumatoid arthritis at P = 1 x 10(-4) - 1 x 10(-5) (tier 3). Here we show that three of these SNPs, mapping to chromosome 10p15 (rs4750316), 12q13 (rs1678542) and 22q13 (rs3218253), are also associated (trend P = 4 x 10(-5), P = 4 x 10(-4) and P = 4 x 10(-4), respectively) in a validation study of 4,106 individuals with rheumatoid arthritis and an expanded reference group of 11,238 subjects, confirming them as true susceptibility loci in individuals of European ancestry.
Funded by: Arthritis Research UK: 17552, 18475; Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0600329; Wellcome Trust: 068545/Z/02, 090532
Nature genetics 2008;40;10;1156-9
A novel 154-bp deletion in the human mitochondrial DNA control region in healthy individuals.
Molecular Medicine Laboratory, Rambam Health Care Campus, Haifa, Israel. GenoPubs@ngs.org
The biological role of the mitochondrial DNA (mtDNA) control region in mtDNA replication remains unclear. In a worldwide survey of mtDNA variation in the general population, we have identified a novel large control region deletion spanning positions 16154 to 16307 (m.16154_16307del154). The population prevalence of this deletion is low, since it was only observed in 1 out of over 120,000 mtDNA genomes studied. The deletion is present in a nonheteroplasmic state, and was transmitted by a mother to her two sons with no apparent past or present disease conditions. The identification of this large deletion in healthy individuals challenges the current view of the control region as playing a crucial role in the regulation of mtDNA replication, and supports the existence of a more complex system of multiple or epigenetically-determined replication origins.
Funded by: Wellcome Trust: 077009
Human mutation 2008;29;12;1387-91
The dawn of human matrilineal diversity.
Molecular Medicine Laboratory, Rambam Health Care Campus, Haifa 31096, Israel. email@example.com
The quest to explain demographic history during the early part of human evolution has been limited because of the scarce paleoanthropological record from the Middle Stone Age. To shed light on the structure of the mitochondrial DNA (mtDNA) phylogeny at the dawn of Homo sapiens, we constructed a matrilineal tree composed of 624 complete mtDNA genomes from sub-Saharan Hg L lineages. We paid particular attention to the Khoi and San (Khoisan) people of South Africa because they are considered to be a unique relic of hunter-gatherer lifestyle and to carry paternal and maternal lineages belonging to the deepest clades known among modern humans. Both the tree phylogeny and coalescence calculations suggest that Khoisan matrilineal ancestry diverged from the rest of the human mtDNA pool 90,000-150,000 years before present (ybp) and that at least five additional, currently extant maternal lineages existed during this period in parallel. Furthermore, we estimate that a minimum of 40 other evolutionarily successful lineages flourished in sub-Saharan Africa during the period of modern human dispersal out of Africa approximately 60,000-70,000 ybp. Only much later, at the beginning of the Late Stone Age, about 40,000 ybp, did introgression of additional lineages occur into the Khoisan mtDNA pool. This process was further accelerated during the recent Bantu expansions. Our results suggest that the early settlement of humans in Africa was already matrilineally structured and involved small, separately evolving isolated populations.
Funded by: Wellcome Trust
American journal of human genetics 2008;82;5;1130-40
Genomic 'valleys of death'.
Nature reviews. Microbiology 2008;6;4;260-1
Genome of the actinomycete plant pathogen Clavibacter michiganensis subsp. sepedonicus suggests recent niche adaptation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, United Kingdom.
Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome.
Journal of bacteriology 2008;190;6;2150-60
Susceptibility loci for intracranial aneurysm in European and Japanese populations.
Department of Neurosurgery, Neurobiology, Yale Center for Human Genetics and Genomics, Yale University School of Medicine, New Haven, CT 06510, USA.
Stroke is the world's third leading cause of death. One cause of stroke, intracranial aneurysm, affects approximately 2% of the population and accounts for 500,000 hemorrhagic strokes annually in mid-life (median age 50), most often resulting in death or severe neurological impairment. The pathogenesis of intracranial aneurysm is unknown, and because catastrophic hemorrhage is commonly the first sign of disease, early identification is essential. We carried out a multistage genome-wide association study (GWAS) of Finnish, Dutch and Japanese cohorts including over 2,100 intracranial aneurysm cases and 8,000 controls. Genome-wide genotyping of the European cohorts and replication studies in the Japanese cohort identified common SNPs on chromosomes 2q, 8q and 9p that show significant association with intracranial aneurysm with odds ratios 1.24-1.36. The loci on 2q and 8q are new, whereas the 9p locus was previously found to be associated with arterial diseases, including intracranial aneurysm. Associated SNPs on 8q likely act via SOX17, which is required for formation and maintenance of endothelial cells, suggesting a role in development and repair of the vasculature; CDKN2A at 9p may have a similar role. These findings have implications for the pathophysiology, diagnosis and therapy of intracranial aneurysm.
Funded by: Howard Hughes Medical Institute; NINDS NIH HHS: R01 NS057756, U24 NS051869; Wellcome Trust: 089062
Nature genetics 2008;40;12;1472-7
Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
It is a major challenge to develop effective sequence database search algorithms to translate molecular weight and fragment mass information obtained from tandem mass spectrometry into high quality peptide and protein assignments. We investigated the peptide identification performance of Mascot and X!Tandem for mass tolerance settings common for low and high accuracy mass spectrometry. We demonstrated that sensitivity and specificity of peptide identification can vary substantially for different mass tolerance settings, but this effect was more significant for Mascot. We present an adjusted Mascot threshold, which allows the user to freely select the best trade-off between sensitivity and specificity. The adjusted Mascot threshold was compared with the default Mascot and X!Tandem scoring thresholds and shown to be more sensitive at the same false discovery rates for both low and high accuracy mass spectrometry data.
Funded by: Wellcome Trust: 077198
Molecular & cellular proteomics : MCP 2008;7;5;962-70
Identification of variation in the platelet transcriptome associated with glycoprotein 6 haplotype.
Department of Haematology, University of Cambridge and National Health Service Blood and Transplant, Cambridge, UK.
Platelet Glycoprotein VI (GPVI) is the activatory collagen signalling receptor that transmits an outside-in signal via the FcR gamma-chain. In Caucasians two GP6 haplotypes have been identified which encode GPVI isoforms that differ by five amino-acids. The minor haplotype is associated with a modest but statistically significant reduction in GPVI abundance and reduced downstream signalling events. As GPVI is also expressed on megakaryocytes, different GPVI isoforms may imprint on the platelet transcriptome. We investigated the association of GP6 haplotype with transcription by comparing the transcriptomes of platelets from individuals homozygous for the major ('a') and minor ('b') haplotypes to identify differentially expressed (DE) transcripts. Platelet RNA was isolated from apheresis concentrates from 16 'aa' donors and eight 'bb' donors. mRNA was amplified using a template-switching PCR based protocol and fluorescently labelled. Samples were randomly paired both within and between haplotypes and compared on a cDNA microarray. No consistently DE transcripts were identified within the 'aa' haplotype but 52 significantly DE transcripts were observed between haplotypes. Generally the fold differences were low (two to four-fold) but were confirmed by qRT-PCR for selected transcripts (TUBB1, P = 0.0004; VWF, P = 0.0126). The results of this study indicate that there are subtle differences between the platelet transcriptomes of individuals who differ by GP6 haplotype. The identification of DE genes may identify critical pathways and nodes not previously known to be involved in platelet development and function.
Funded by: Medical Research Council: MC_U105261167; Wellcome Trust
Large-scale screening for novel low-affinity extracellular protein interactions.
Cell Surface Signalling Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, United Kingdom.
Extracellular protein-protein interactions are essential for both intercellular communication and cohesion within multicellular organisms. Approximately a fifth of human genes encode membrane-tethered or secreted proteins, but they are largely absent from recent large-scale protein interaction datasets, making current interaction networks biased and incomplete. This discrepancy is due to the unsuitability of popular high-throughput methods to detect extracellular interactions because of the biochemical intractability of membrane proteins and their interactions. For example, cell surface proteins contain insoluble hydrophobic transmembrane regions, and their extracellular interactions are often highly transient, having half-lives of less than a second. To detect transient extracellular interactions on a large scale, we developed AVEXIS (avidity-based extracellular interaction screen), a high-throughput assay that overcomes these technical issues and can detect very transient interactions (half-lives <or= 0.1 sec) with a low false-positive rate. We used it to systematically screen for receptor-ligand pairs within the zebrafish immunoglobulin superfamily and identified novel ligands for both well-known and orphan receptors. Genes encoding receptor-ligand pairs were often clustered phylogenetically and expressed in the same or adjacent tissues, immediately implying their involvement in similar biological processes. Using AVEXIS, we have determined the first systematic low-affinity extracellular protein interaction network, supported by independent biological data. This technique will now allow large-scale extracellular protein interaction mapping in a broad range of experimental contexts.
Funded by: Wellcome Trust: 087656
Genome research 2008;18;4;622-30
Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing.
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.
During the clonal expansion of cancer from an ancestral cell with an initiating oncogenic mutation to symptomatic neoplasm, the occurrence of somatic mutations (both driver and passenger) can be used to track the on-going evolution of the neoplasm. All subclones within a cancer are phylogenetically related, with the prevalence of each subclone determined by its evolutionary fitness and the timing of its origin relative to other subclones. Recently developed massively parallel sequencing platforms promise the ability to detect rare subclones of genetic variants without a priori knowledge of the mutations involved. We used ultra-deep pyrosequencing to investigate intraclonal diversification at the Ig heavy chain locus in 22 patients with B-cell chronic lymphocytic leukemia. Analysis of a non-polymorphic control locus revealed artifactual insertions and deletions resulting from sequencing errors and base substitutions caused by polymerase misincorporation during PCR amplification. We developed an algorithm to differentiate genuine haplotypes of somatic hypermutations from such artifacts. This proved capable of detecting multiple rare subclones with frequencies as low as 1 in 5000 copies and allowed the characterization of phylogenetic interrelationships among subclones within each patient. This study demonstrates the potential for ultra-deep resequencing to recapitulate the dynamics of clonal evolution in cancer cell populations.
Funded by: Wellcome Trust: 088340
Proceedings of the National Academy of Sciences of the United States of America 2008;105;35;13081-6
Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.
Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.
Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.
Funded by: Wellcome Trust: 077012, 088340
Nature genetics 2008;40;6;722-9
Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Motivation: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences.
Results: Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.
Availability: Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/
Funded by: Wellcome Trust: 082372
Bioinformatics (Oxford, England) 2008;24;23;2672-6
Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database.
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. firstname.lastname@example.org
The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.
Funded by: Wellcome Trust: 062023, 077198, 085532
Nucleic acids research 2008;36;Database issue;D5-12
Identifying protein domains with the Pfam database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented.
Funded by: Wellcome Trust: 087656
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 2008;Chapter 2;Unit 2.5
Mapping multiprotein complexes by affinity purification and mass spectrometry.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
The combination of affinity purification and tandem mass spectrometry (MS) has emerged as a powerful approach to delineate biological processes. In particular, the use of epitope tags has allowed this approach to become scaleable and has bypassed difficulties associated with generation of antibodies. Single epitope tags and tandem affinity purification (TAP) tags have been used to systematically map protein complexes generating protein interaction data at a near proteome-wide scale. Recent developments in the design of tags, optimisation of purification conditions, experimental design and data analysis have greatly improved the sensitivity and specificity of this approach. Concomitant developments in MS, including high accuracy and high-throughput instrumentation together with quantitative MS methods, have facilitated large-scale and comprehensive analysis of multiprotein complexes.
Current opinion in biotechnology 2008;19;4;324-30
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.
We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.
Funded by: Wellcome Trust
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
The complete genome, comparative and functional analysis of Stenotrophomonas maltophilia reveals an organism heavily shielded by drug resistance determinants.
Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
Background: Stenotrophomonas maltophilia is a nosocomial opportunistic pathogen of the Xanthomonadaceae. The organism has been isolated from both clinical and soil environments in addition to the sputum of cystic fibrosis patients and the immunocompromised. Whilst relatively distant phylogenetically, the closest sequenced relatives of S. maltophilia are the plant pathogenic xanthomonads.
Results: The genome of the bacteremia-associated isolate S. maltophilia K279a is 4,851,126 bp and of high G+C content. The sequence reveals an organism with a remarkable capacity for drug and heavy metal resistance. In addition to a number of genes conferring resistance to antimicrobial drugs of different classes via alternative mechanisms, nine resistance-nodulation-division (RND)-type putative antimicrobial efflux systems are present. Functional genomic analysis confirms a role in drug resistance for several of the novel RND efflux pumps. S. maltophilia possesses potentially mobile regions of DNA and encodes a number of pili and fimbriae likely to be involved in adhesion and biofilm formation that may also contribute to increased antimicrobial drug resistance.
Conclusion: The panoply of antimicrobial drug resistance genes and mobile genetic elements found suggests that the organism can act as a reservoir of antimicrobial drug resistance determinants in a clinical environment, which is an issue of considerable concern.
Funded by: Wellcome Trust
Genome biology 2008;9;4;R74
The RNA WikiProject: community annotation of RNA families.
The online encyclopedia Wikipedia has become one of the most important online references in the world and has a substantial and growing scientific content. A search of Google with many RNA-related keywords identifies a Wikipedia article as the top hit. We believe that the RNA community has an important and timely opportunity to maximize the content and quality of RNA information in Wikipedia. To this end, we have formed the RNA WikiProject (http://en.wikipedia.org/wiki/Wikipedia:WikiProject_RNA) as part of the larger Molecular and Cellular Biology WikiProject. We have created over 600 new Wikipedia articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct these articles. The Rfam database now redistributes this Wikipedia content as the primary textual annotation of its RNA families. Users can, therefore, for the first time, directly edit the content of one of the major RNA databases. We believe that this Wikipedia/Rfam link acts as a functioning model for incorporating community annotation into molecular biology databases.
Funded by: Howard Hughes Medical Institute; NIGMS NIH HHS: R01 GM087721; Wellcome Trust: 077044
RNA (New York, N.Y.) 2008;14;12;2462-4
From gene expression to disease risk.
Nature genetics 2008;40;5;492-3
Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE).
Institute for Systems Biology, 1441 N 34th Street, Seattle, Washington 98103, USA.
One purpose of the biomedical literature is to report results in sufficient detail that the methods of data collection and analysis can be independently replicated and verified. Here we present reporting guidelines for gene expression localization experiments: the minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). MISFISHIE is modeled after the Minimum Information About a Microarray Experiment (MIAME) specification for microarray experiments. Both guidelines define what information should be reported without dictating a format for encoding that information. MISFISHIE describes six types of information to be provided for each experiment: experimental design, biomaterials and treatments, reporters, staining, imaging data and image characterizations. This specification has benefited the consortium within which it was developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E025080/1; Medical Research Council: MC_U117532048, MC_U127527203; NHLBI NIH HHS: R33 HL073712; NIDDK NIH HHS: DK63328, DK63400, DK63481, DK63483, DK63630, R01 DK079798, R01 DK079798-01A2
Nature biotechnology 2008;26;3;305-12
X-linked protocadherin 19 mutations cause female-limited epilepsy and cognitive impairment.
Department of Genetic Medicine, Level 9 Rieger Building, Women's and Children's Hospital, 72 King William Road, North Adelaide, South Australia 5006, Australia. email@example.com
Epilepsy and mental retardation limited to females (EFMR) is a disorder with an X-linked mode of inheritance and an unusual expression pattern. Disorders arising from mutations on the X chromosome are typically characterized by affected males and unaffected carrier females. In contrast, EFMR spares transmitting males and affects only carrier females. Aided by systematic resequencing of 737 X chromosome genes, we identified different protocadherin 19 (PCDH19) gene mutations in seven families with EFMR. Five mutations resulted in the introduction of a premature termination codon. Study of two of these demonstrated nonsense-mediated decay of PCDH19 mRNA. The two missense mutations were predicted to affect adhesiveness of PCDH19 through impaired calcium binding. PCDH19 is expressed in developing brains of human and mouse and is the first member of the cadherin superfamily to be directly implicated in epilepsy or mental retardation.
Funded by: NICHD NIH HHS: N01-HD-4-3368, N01-HD-4-3383; NIGMS NIH HHS: GM061354, P01 GM061354; NIMH NIH HHS: R01 MH 64547, R01 MH064547, R01 MH064547-01, R01 MH064547-01S1, R01 MH064547-02, R01 MH064547-02S1, R01 MH064547-03, R01 MH064547-04, R01 MH064547-05; Wellcome Trust
Nature genetics 2008;40;6;776-81
Efficient targeted transcript discovery via array-based normalization of RACE libraries.
Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain.
Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.
Funded by: NCI NIH HHS: N01-CO-12400, N01CO12400; NHGRI NIH HHS: U01 HG003147, U01 HG003147-01, U01 HG003147-02, U01 HG003147-02S1, U01 HG003147-02S2, U01 HG003147-02S3, U01 HG003150, U01 HG003150-01, U01 HG003150-02, U01 HG003150-03, U01 HG003150-03S1, U01 HG003150-03S2, U01HG003147, U01HG003150, U54 HG004555, U54 HG004557, U54 HG004557-01, U54 HG004557-02, U54 HG004557-02S1, U54 HG004557-03; Wellcome Trust: 077198
Nature methods 2008;5;7;629-35
NestedMICA as an ab initio protein motif discovery tool.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK. firstname.lastname@example.org
Background: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length.
Results: Generally NestedMICA recovered most of the short (3-9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.
Conclusion: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.
Availability: NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/
Funded by: Wellcome Trust: 077198
BMC bioinformatics 2008;9;19
A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis.
Wellcome Trust Cancer Research UK Gurdon Institute, and Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK. email@example.com
DNA methylation is an indispensible epigenetic modification required for regulating the expression of mammalian genomes. Immunoprecipitation-based methods for DNA methylome analysis are rapidly shifting the bottleneck in this field from data generation to data analysis, necessitating the development of better analytical tools. In particular, an inability to estimate absolute methylation levels remains a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling. To address this issue, we developed a cross-platform algorithm-Bayesian tool for methylation analysis (Batman)-for analyzing methylated DNA immunoprecipitation (MeDIP) profiles generated using oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). We developed the latter approach to provide a high-resolution whole-genome DNA methylation profile (DNA methylome) of a mammalian genome. Strong correlation of our data, obtained using mature human spermatozoa, with those obtained using bisulfite sequencing suggest that combining MeDIP-seq or MeDIP-chip with Batman provides a robust, quantitative and cost-effective functional genomic strategy for elucidating the function of DNA methylation.
Funded by: Cancer Research UK: C14303/A8646; Wellcome Trust: 077198, 083563, 084071
Nature biotechnology 2008;26;7;779-85
The evolution of the DLK1-DIO3 imprinted domain in mammals.
Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
A comprehensive, domain-wide comparative analysis of genomic imprinting between mammals that imprint and those that do not can provide valuable information about how and why imprinting evolved. The imprinting status, DNA methylation, and genomic landscape of the Dlk1-Dio3 cluster were determined in eutherian, metatherian, and prototherian mammals including tammar wallaby and platypus. Imprinting across the whole domain evolved after the divergence of eutherian from marsupial mammals and in eutherians is under strong purifying selection. The marsupial locus at 1.6 megabases, is double that of eutherians due to the accumulation of LINE repeats. Comparative sequence analysis of the domain in seven vertebrates determined evolutionary conserved regions common to particular sub-groups and to all vertebrates. The emergence of Dlk1-Dio3 imprinting in eutherians has occurred on the maternally inherited chromosome and is associated with region-specific resistance to expansion by repetitive elements and the local introduction of noncoding transcripts including microRNAs and C/D small nucleolar RNAs. A recent mammal-specific retrotransposition event led to the formation of a completely new gene only in the eutherian domain, which may have driven imprinting at the cluster.
Funded by: Medical Research Council: G0400156
PLoS biology 2008;6;6;e135
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.
Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.
Funded by: Medical Research Council: G90/112, G90/93; Wellcome Trust: 077155
Nature neuroscience 2008;11;7;799-806
Evaluating the role of LPIN1 variation in insulin resistance, body weight, and human lipodystrophy in U.K. Populations.
Metabolic Disease Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U.K.
Objective: Loss of lipin 1 activity causes lipodystrophy and insulin resistance in the fld mouse, and LPIN1 expression and common genetic variation were recently suggested to influence adiposity and insulin sensitivity in humans. We aimed to conduct a comprehensive association study to clarify the influence of common LPIN1 variation on adiposity and insulin sensitivity in U.K. populations and to examine the role of LPIN1 mutations in insulin resistance syndromes.
Research design and method: Twenty-two single nucleotide polymorphisms tagging common LPIN1 variation were genotyped in Medical Research Council (MRC) Ely (n = 1,709) and Hertfordshire (n = 2,901) population-based cohorts. LPIN1 exons, exon/intron boundaries, and 3' untranslated region were sequenced in 158 patients with idiopathic severe insulin resistance (including 23 lipodystrophic patients) and 48 control subjects.
Results: We found no association between LPIN1 single nucleotide polymorphisms and fasting insulin but report a nominal association between rs13412852 and BMI (P = 0.042) in a meta-analysis of 8,504 samples from in-house and publicly available studies. Three rare nonsynonymous variants (A353T, R552K, and G582R) were detected in severely insulin-resistant patients. However, these did not cosegregate with disease in affected families, and Lipin1 protein expression and phosphorylation in patients with variants were indistinguishable from those in control subjects.
Conclusions: Our data do not support a major effect of common LPIN1 variation on metabolic traits and suggest that mutations in LPIN1 are not a common cause of lipodystrophy in humans. The nominal associations with BMI and other metabolic traits in U.K. cohorts require replication in larger cohorts.
Funded by: Medical Research Council: G0000934, G0000934(68341), G0701446, MC_U106188470, MC_U147574221, MC_U147585824, MC_UP_A620_1014, U.1475.00.002.00001.01 (85824), U.1475.00.004.00002.01(74221); Wellcome Trust: 068545, 068545/Z/02, 077016, 078986, 078986/Z/06/Z, 080952, 080952/Z/06/Z
Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder.
Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
To identify susceptibility loci for bipolar disorder, we tested 1.8 million variants in 4,387 cases and 6,209 controls and identified a region of strong association (rs10994336, P = 9.1 x 10(-9)) in ANK3 (ankyrin G). We also found further support for the previously reported CACNA1C (alpha 1C subunit of the L-type voltage-gated calcium channel; combined P = 7.0 x 10(-8), rs1006737). Our results suggest that ion channelopathies may be involved in the pathogenesis of bipolar disorder.
Funded by: Chief Scientist Office; Medical Research Council: G0500791, G0701003, G9309834, G9623693N; NCRR NIH HHS: U54 RR020278; NIMH NIH HHS: MH062137, MH063445, MH067288, MH63420, N01MH80001, R01 MH062137, R01 MH063420, R01 MH063445, R01 MH067288; Wellcome Trust: 076113, 077011, 082371
Nature genetics 2008;40;9;1056-8
The minimum information about a genome sequence (MIGS) specification.
Natural Environmental Research Council Centre for Ecology and Hydrology, Oxford OX1 3SR, UK. firstname.lastname@example.org
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E025080/1; Intramural NIH HHS: Z99 LM999999; Medical Research Council: G8225539; NHGRI NIH HHS: U54 HG004028
Nature biotechnology 2008;26;5;541-7
The Pfam protein families database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).
Funded by: Biotechnology and Biological Sciences Research Council: BB/F010435/1; Medical Research Council: G0100305; Wellcome Trust: 087656
Nucleic acids research 2008;36;Database issue;D281-8
Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease.
Department of Medical and Molecular Genetics, King's College London School of Medicine, 8th Floor Guy's Tower, Guy's Hospital, London SE1 9RT, UK.
We report results of a nonsynonymous SNP scan for ulcerative colitis and identify a previously unknown susceptibility locus at ECM1. We also show that several risk loci are common to ulcerative colitis and Crohn's disease (IL23R, IL12B, HLA, NKX2-3 and MST1), whereas autophagy genes ATG16L1 and IRGM, along with NOD2 (also known as CARD15), are specific for Crohn's disease. These data provide the first detailed illustration of the genetic relationship between these common inflammatory bowel diseases.
Funded by: Chief Scientist Office: CZB/4/540; Medical Research Council: G0000934, G0400874, G0600329, G0800383, G0800759, G0802320, MC_QA137934, MC_U105260799; Wellcome Trust: 076113, 077011, 089120, 090532
Nature genetics 2008;40;6;710-2
European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. email@example.com
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein-DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E010768/1, BB/E011640/1, BBE0116401, BBS/B/13438, BBS/B/13446, BBS/B/13462, BBS/B/13470; Wellcome Trust: 062023, 077198
Nucleic acids research 2008;36;Database issue;D707-14
Testing of diabetes-associated WFS1 polymorphisms in the Diabetes Prevention Program.
Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA. firstname.lastname@example.org
Aims/hypothesis: Wolfram syndrome (diabetes insipidus, diabetes mellitus, optic atrophy and deafness) is caused by mutations in the WFS1 gene. Recently, single nucleotide polymorphisms (SNPs) in WFS1 have been reproducibly associated with type 2 diabetes. We therefore examined the effects of these variants on diabetes incidence and response to interventions in the Diabetes Prevention Program (DPP), in which a lifestyle intervention or metformin treatment was compared with placebo.
Methods: We genotyped the WFS1 SNPs rs10010131, rs752854 and rs734312 (H611R) in 3,548 DPP participants and performed Cox regression analysis using genotype, intervention and their interactions as predictors of diabetes incidence. We also evaluated the effect of these SNPs on insulin resistance and beta cell function at 1 year.
Results: Although none of the three SNPs was associated with diabetes incidence in the overall cohort, white homozygotes for the previously reported protective alleles appeared less likely to develop diabetes in the lifestyle arm. Examination of the publicly available Diabetes Genetics Initiative genome-wide association dataset revealed that rs10012946, which is in strong linkage disequilibrium with the three WFS1 SNPs (r(2)=0.88-1.0), was associated with type 2 diabetes (allelic odds ratio 0.85, 95% CI 0.75-0.97, p=0.026). In the DPP, we noted a trend towards increased insulin secretion in carriers of the protective variants, although for most SNPs this was seen as compensatory for the diminished insulin sensitivity.
Conclusions/interpretation: The previously reported protective effect of select WFS1 alleles may be magnified by a lifestyle intervention. These variants appear to confer an improvement in beta cell function.
Funded by: Intramural NIH HHS; Medical Research Council: MC_U106179471; NIDDK NIH HHS: K23 DK65978-04, R01 DK072041-02, U01 DK048489, U01 DK048489-06
The Catalogue of Somatic Mutations in Cancer (COSMIC).
Wellcome Trust Genome Campus, Hinxton, United Kingdom.
COSMIC is currently the most comprehensive global resource for information on somatic mutations in human cancer, combining curation of the scientific literature with tumor resequencing data from the Cancer Genome Project at the Sanger Institute, U.K. Almost 4800 genes and 250000 tumors have been examined, resulting in over 50000 mutations available for investigation. This information can be accessed in a number of ways, the most convenient being the Web-based system which allows detailed data mining, presenting the results in easily interpretable formats. This unit describes the graphical system in detail, elaborating an example walkthrough and the many ways that the resulting information can be thoroughly investigated by combining data, respecializing the query, or viewing the results in different ways. Alternate protocols overview the available precompiled data files available for download.
Funded by: Wellcome Trust: 077012
Current protocols in human genetics 2008;Chapter 10;Unit 10.11
Replication of the association between variants in WFS1 and risk of type 2 diabetes in European populations.
Department of Public Health and Clinical Medicine, Umeå University Hospital, Umeå, Sweden. email@example.com
Aims/hypothesis: Mutations at the gene encoding wolframin (WFS1) cause Wolfram syndrome, a rare neurological condition. Associations between single nucleotide polymorphisms (SNPs) at WFS1 and type 2 diabetes have recently been reported. Thus, our aim was to replicate those associations in a northern Swedish case-control study of type 2 diabetes. We also performed a meta-analysis of published and previously unpublished data from Sweden, Finland and France, to obtain updated summary effect estimates.
Methods: Four WFS1 SNPs (rs10010131, rs6446482, rs752854 and rs734312 [H611R]) were genotyped in a type 2 diabetes case-control study (n = 1,296/1,412) of Swedish adults. Logistic regression was used to assess the association between each WFS1 SNP and type 2 diabetes, following adjustment for age, sex and BMI. We then performed a meta-analysis of 11 studies of type 2 diabetes, comprising up to 14,139 patients and 16,109 controls, to obtain a summary effect estimate for the WFS1 variants.
Results: In the northern Swedish study, the minor allele at rs752854 was associated with reduced type 2 diabetes risk [odds ratio (OR) 0.85, 95% CI 0.75-0.96, p=0.010]. Borderline statistical associations were observed for the remaining SNPs. The meta-analysis of the four independent replication studies for SNP rs10010131 and correlated variants showed evidence for statistical association (OR 0.87, 95% CI 0.82-0.93, p=4.5 x 10(-5)). In an updated meta-analysis of all 11 studies, strong evidence of statistical association was also observed (OR 0.89, 95% CI 0.86-0.92; p=4.9 x 10(-11)).
Conclusions/interpretation: In this study of WFS1 variants and type 2 diabetes risk, we have replicated the previously reported associations between SNPs at this locus and the risk of type 2 diabetes.
Funded by: Medical Research Council: G0600331, MC_U106179471; NIDDK NIH HHS: DK62370, DK72193; Wellcome Trust: 077016
SrfB, a member of the Serum Response Factor family of transcription factors, regulates starvation response and early development in Dictyostelium.
Instituto de Investigaciones Biomédicas CSIC/UAM. Arturo Duperier, 4. 28029 Madrid, Spain.
The Serum Response Factor (SRF) is an important regulator of cell proliferation and differentiation. Dictyostelium discoideum srfB gene codes for an SRF homologue and is expressed in vegetative cells and during development under the control of three alternative promoters, which show different cell-type specific patterns of expression. The two more proximal promoters directed gene transcription in prestalk AB, stalk and lower-cup cells. The generation of a strain where the srfB gene has been interrupted (srfB(-)) has shown that this gene is required for regulation of actin-cytoskeleton-related functions, such as cytokinesis and macropinocytosis. The mutant failed to develop well in suspension, but could be rescued by cAMP pulsing, suggesting a defect in cAMP signaling. srfB(-) cells showed impaired chemotaxis to cAMP and defective lateral pseudopodium inhibition. Nevertheless, srfB(-) cells aggregated on agar plates and nitrocellulose filters 2 h earlier than wild type cells, and completed development, showing an increased tendency to form slug structures. Analysis of wild type and srfB(-) strains detected significant differences in the regulation of gene expression upon starvation. Genes coding for lysosomal and ribosomal proteins, developmentally-regulated genes, and some genes coding for proteins involved in cytoskeleton regulation were deregulated during the first stages of development.
Funded by: Wellcome Trust
Developmental biology 2008;316;2;260-74
ES cell pluripotency and germ-layer formation require the SWI/SNF chromatin remodeling component BAF250a.
Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Richard Simches Research Center, 185 Cambridge Street, Boston, MA 02114, USA.
ATP-dependent chromatin remodeling complexes are a notable group of epigenetic modifiers that use the energy of ATP hydrolysis to change the structure of chromatin, thereby altering its accessibility to nuclear factors. BAF250a (ARID1a) is a unique and defining subunit of the BAF chromatin remodeling complex with the potential to facilitate chromosome alterations critical during development. Our studies show that ablation of BAF250a in early mouse embryos results in developmental arrest (about embryonic day 6.5) and absence of the mesodermal layer, indicating its critical role in early germ-layer formation. Moreover, BAF250a deficiency compromises ES cell pluripotency, severely inhibits self-renewal, and promotes differentiation into primitive endoderm-like cells under normal feeder-free culture conditions. Interestingly, this phenotype can be partially rescued by the presence of embryonic fibroblast cells. DNA microarray, immunostaining, and RNA analyses revealed that BAF250a-mediated chromatin remodeling contributes to the proper expression of numerous genes involved in ES cell self-renewal, including Sox2, Utf1, and Oct4. Furthermore, the pluripotency defects in BAF250a mutant ES cells appear to be cell lineage-specific. For example, embryoid body-based analyses demonstrated that BAF250a-ablated stem cells are defective in differentiating into fully functional mesoderm-derived cardiomyocytes and adipocytes but are capable of differentiating into ectoderm-derived neurons. Our results suggest that BAF250a is a key component of the gene regulatory machinery in ES cells controlling self-renewal, differentiation, and cell lineage decisions.
Funded by: Howard Hughes Medical Institute
Proceedings of the National Academy of Sciences of the United States of America 2008;105;18;6656-61
Apheresis donors and platelet function: inherent platelet responsiveness influences platelet quality.
Department of Haematology, University of Cambridge, Cambridge, UK.
Background: Process-induced platelet (PLT) activation occurs with all production methods, including apheresis. Recent studies have highlighted the range and consistence of interindividual variation in the PLT response, but little is known about the contribution of a donors' inherent PLT responsiveness to the activation state of the apheresis PLTs or the effect of frequent apheresis on donors' PLTs.
Study design and methods: The relationship between the donors' PLT response on the apheresis PLTs was studied in 47 individuals selected as having PLTs with inherently low, intermediate, or high responsiveness. Whole-blood flow cytometry was used to measure PLT activation (levels of bound fibrinogen) before donation and in the apheresis PLTs. The effects of regular apheresis on the activation status of donors' PLTs were studied by comparing the in vivo activation status of PLTs from apheresis (n = 349) and whole-blood donors (n = 157), before donation. The effect of apheresis per se on PLT activation was measured in 10 apheresis donors before and after donation.
Results: The level of PLT activation in the apheresis packs was generally higher than in the donor, and the most activated PLTs were from high-responder donors. There was no significant difference in PLT activation before donation between the apheresis and whole-blood donors (p = 0.697), and there was no consistent evidence of activation in the donors immediately after apheresis.
Conclusion: The most activated apheresis PLTs were obtained from donors with more responsive PLTs. Regular apheresis, however, does not lead to PLT activation in the donors.
The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.
Japan Biological Information Research Center, Japan Biological Informatics Consortium, Japan.
Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.
Funded by: NHLBI NIH HHS: P50 HL054998, R01 HL064541; Wellcome Trust: 077198
Nucleic acids research 2008;36;Database issue;D793-9
The missing link: Bordetella petrii is endowed with both the metabolic versatility of environmental bacteria and virulence traits of pathogenic Bordetellae.
Chair of Microbiology, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany. firstname.lastname@example.org
Background: Bordetella petrii is the only environmental species hitherto found among the otherwise host-restricted and pathogenic members of the genus Bordetella. Phylogenetically, it connects the pathogenic Bordetellae and environmental bacteria of the genera Achromobacter and Alcaligenes, which are opportunistic pathogens. B. petrii strains have been isolated from very different environmental niches, including river sediment, polluted soil, marine sponges and a grass root. Recently, clinical isolates associated with bone degenerative disease or cystic fibrosis have also been described.
Results: In this manuscript we present the results of the analysis of the completely annotated genome sequence of the B. petrii strain DSMZ12804. B. petrii has a mosaic genome of 5,287,950 bp harboring numerous mobile genetic elements, including seven large genomic islands. Four of them are highly related to the clc element of Pseudomonas knackmussii B13, which encodes genes involved in the degradation of aromatics. Though being an environmental isolate, the sequenced B. petrii strain also encodes proteins related to virulence factors of the pathogenic Bordetellae, including the filamentous hemagglutinin, which is a major colonization factor of B. pertussis, and the master virulence regulator BvgAS. However, it lacks all known toxins of the pathogenic Bordetellae.
Conclusion: The genomic analysis suggests that B. petrii represents an evolutionary link between free-living environmental bacteria and the host-restricted obligate pathogenic Bordetellae. Its remarkable metabolic versatility may enable B. petrii to thrive in very different ecological niches.
BMC genomics 2008;9;449
A novel streptococcal integrative conjugative element involved in iron acquisition.
Centre for Preventive Medicine, Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk, UK.
In this study, we determined the function of a novel non-ribosomal peptide synthetase (NRPS) system carried by a streptococcal integrative conjugative element (ICE), ICESe2. The NRPS shares similarity with the yersiniabactin system found in the high-pathogenicity island of Yersinia sp. and is the first of its kind to be identified in streptococci. We named the NRPS product 'equibactin' and genes of this locus eqbA-N. ICESe2, although absolutely conserved in Streptococcus equi, the causative agent of equine strangles, was absent from all strains of the closely related opportunistic pathogen Streptococcus zooepidemicus. Binding of EqbA, a DtxR-like regulator, to the eqbB promoter was increased in the presence of cations. Deletion of eqbA resulted in a small-colony phenotype. Further deletion of the irp2 homologue eqbE, or the genes eqbH, eqbI and eqbJ encoding a putative ABC transporter, or addition of the iron chelator nitrilotriacetate, reversed this phenotype, implicating iron toxicity. Quantification of (55)Fe accumulation and sensitivity to streptonigrin suggested that equibactin is secreted by S. equi and that the eqbH, eqbI and eqbJ genes are required for its associated iron import. In agreement with a structure-based model of equibactin synthesis, supplementation of chemically defined media with salicylate was required for equibactin production.
Molecular microbiology 2008;70;5;1274-92
Sequence data swell for nematodes.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
With more than 80,000 described species that are extremely diverse in terms of ecology and biology, the Nematoda phylum is one of the most common animal phyla. This month's Genome Watch describes genomes of several nematodes, including that of the human filarial parasite Brugia malayi.
Nature reviews. Microbiology 2008;6;11;800-1
Telomeric expression sites are highly conserved in Trypanosoma brucei.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. firstname.lastname@example.org
Subtelomeric regions are often under-represented in genome sequences of eukaryotes. One of the best known examples of the use of telomere proximity for adaptive purposes are the bloodstream expression sites (BESs) of the African trypanosome Trypanosoma brucei. To enhance our understanding of BES structure and function in host adaptation and immune evasion, the BES repertoire from the Lister 427 strain of T. brucei were independently tagged and sequenced. BESs are polymorphic in size and structure but reveal a surprisingly conserved architecture in the context of extensive recombination. Very small BESs do exist and many functioning BESs do not contain the full complement of expression site associated genes (ESAGs). The consequences of duplicated or missing ESAGs, including ESAG9, a newly named ESAG12, and additional variant surface glycoprotein genes (VSGs) were evaluated by functional assays after BESs were tagged with a drug-resistance gene. Phylogenetic analysis of constituent ESAG families suggests that BESs are sequence mosaics and that extensive recombination has shaped the evolution of the BES repertoire. This work opens important perspectives in understanding the molecular mechanisms of antigenic variation, a widely used strategy for immune evasion in pathogens, and telomere biology.
Funded by: NIAID NIH HHS: R01 AI021729, R01AI021729; Wellcome Trust: 095161
PloS one 2008;3;10;e3527
A Myo6 mutation destroys coordination between the myosin heads, revealing new functions of myosin VI in the stereocilia of mammalian inner ear hair cells.
Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel.
Myosin VI, found in organisms from Caenorhabditis elegans to humans, is essential for auditory and vestibular function in mammals, since genetic mutations lead to hearing impairment and vestibular dysfunction in both humans and mice. Here, we show that a missense mutation in this molecular motor in an ENU-generated mouse model, Tailchaser, disrupts myosin VI function. Structural changes in the Tailchaser hair bundles include mislocalization of the kinocilia and branching of stereocilia. Transfection of GFP-labeled myosin VI into epithelial cells and delivery of endocytic vesicles to the early endosome revealed that the mutant phenotype displays disrupted motor function. The actin-activated ATPase rates measured for the D179Y mutation are decreased, and indicate loss of coordination of the myosin VI heads or 'gating' in the dimer form. Proper coordination is required for walking processively along, or anchoring to, actin filaments, and is apparently destroyed by the proximity of the mutation to the nucleotide-binding pocket. This loss of myosin VI function may not allow myosin VI to transport its cargoes appropriately at the base and within the stereocilia, or to anchor the membrane of stereocilia to actin filaments via its cargos, both of which lead to structural changes in the stereocilia of myosin VI-impaired hair cells, and ultimately leading to deafness.
Funded by: Medical Research Council: G0300212, MC_QA137918; NEI NIH HHS: R01 EY012695, R01-EY12695; NIDCD NIH HHS: R01-DC0099100; Wellcome Trust
PLoS genetics 2008;4;10;e1000207
The genome sequence of the fish pathogen Aliivibrio salmonicida strain LFI1238 shows extensive evidence of gene decay.
Department of Molecular Biotechnology, Institute of Medical Biology, Faculty of Medicine, University of Tromsø, N-9037 Tromsø, Norway. email@example.com
Background: The fish pathogen Aliivibrio salmonicida is the causative agent of cold-water vibriosis in marine aquaculture. The Gram-negative bacterium causes tissue degradation, hemolysis and sepsis in vivo.
Results: In total, 4 286 protein coding sequences were identified, and the 4.6 Mb genome of A. salmonicida has a six partite architecture with two chromosomes and four plasmids. Sequence analysis revealed a highly fragmented genome structure caused by the insertion of an extensive number of insertion sequence (IS) elements. The IS elements can be related to important evolutionary events such as gene acquisition, gene loss and chromosomal rearrangements. New A. salmonicida functional capabilities that may have been aquired through horizontal DNA transfer include genes involved in iron-acquisition, and protein secretion and play potential roles in pathogenicity. On the other hand, the degeneration of 370 genes and consequent loss of specific functions suggest that A. salmonicida has a reduced metabolic and physiological capacity in comparison to related Vibrionaceae species.
Conclusion: Most prominent is the loss of several genes involved in the utilisation of the polysaccharide chitin. In particular, the disruption of three extracellular chitinases responsible for enzymatic breakdown of chitin makes A. salmonicida unable to grow on the polymer form of chitin. These, and other losses could restrict the variety of carrier organisms A. salmonicida can attach to, and associate with. Gene acquisition and gene loss may be related to the emergence of A. salmonicida as a fish pathogen.
BMC genomics 2008;9;616
High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Isolates of Salmonella enterica serovar Typhi (Typhi), a human-restricted bacterial pathogen that causes typhoid, show limited genetic variation. We generated whole-genome sequences for 19 Typhi isolates using 454 (Roche) and Solexa (Illumina) technologies. Isolates, including the previously sequenced CT18 and Ty2 isolates, were selected to represent major nodes in the phylogenetic tree. Comparative analysis showed little evidence of purifying selection, antigenic variation or recombination between isolates. Rather, evolution in the Typhi population seems to be characterized by ongoing loss of gene function, consistent with a small effective population size. The lack of evidence for antigenic variation driven by immune selection is in contrast to strong adaptive selection for mutations conferring antibiotic resistance in Typhi. The observed patterns of genetic isolation and drift are consistent with the proposed key role of asymptomatic carriers of Typhi as the main reservoir of this pathogen, highlighting the need for identification and treatment of carriers.
Funded by: Wellcome Trust: 067321
Nature genetics 2008;40;8;987-93
Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 048880, 062023, 077198
Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes.
Department of Pathology, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK.
Chromosome translocations in the common epithelial cancers are abundant, yet little is known about them. They have been thought to be almost all unbalanced and therefore dismissed as mostly mediating tumour suppressor loss. We present a comprehensive analysis by array painting of the chromosome translocations of breast cancer cell lines HCC1806, HCC1187 and ZR-75-30. In array painting, chromosomes are isolated by flow cytometry, amplified and hybridized to DNA microarrays. A total of 200 breakpoints were identified and all were mapped to 1 Mb resolution on bacterial artificial chromosome (BAC) arrays, then 40 selected breakpoints, including all balanced breakpoints, were further mapped on tiling-path BAC arrays or to around 2 kb resolution using oligonucleotide arrays. Many more of the translocations were balanced at 1 Mb resolution than expected, either reciprocal (eight in total) or balanced for at least one participating chromosome (19 paired breakpoints). Second, many of the breakpoints were at genes that are plausible targets of oncogenic translocation, including balanced breaks at CTCF, EP300/p300 and FOXP4. Two gene fusions were demonstrated, TAX1BP1-AHCY and RIF1-PKD1L1. Our results support the idea that chromosome rearrangements may play an important role in common epithelial cancers such as breast cancer.
Funded by: Cancer Research UK: A4392; Wellcome Trust: 077008
Newly identified genetic risk variants for celiac disease related to the immune response.
Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, 4 Newark Street, London E1 2AT, UK.
Our genome-wide association study of celiac disease previously identified risk variants in the IL2-IL21 region. To identify additional risk variants, we genotyped 1,020 of the most strongly associated non-HLA markers in an additional 1,643 cases and 3,406 controls. Through joint analysis including the genome-wide association study data (767 cases, 1,422 controls), we identified seven previously unknown risk regions (P < 5 x 10(-7)). Six regions harbor genes controlling immune responses, including CCR3, IL12A, IL18RAP, RGS1, SH2B3 (nsSNP rs3184504) and TAGAP. Whole-blood IL18RAP mRNA expression correlated with IL18RAP genotype. Type 1 diabetes and celiac disease share HLA-DQ, IL2-IL21, CCR3 and SH2B3 risk regions. Thus, this extensive genome-wide association follow-up study has identified additional celiac disease risk variants in relevant biological pathways.
Funded by: Medical Research Council: G0000934; Wellcome Trust: 068094, 068545/Z/02, 084743, GR068094MA
Nature genetics 2008;40;4;395-402
The functional impact of structural variation in humans.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Structural variation includes many different types of chromosomal rearrangement and encompasses millions of bases in every human genome. Over the past 3 years, the extent and complexity of structural variation has become better appreciated. Diverse approaches have been adopted to explore the functional impact of this class of variation. As disparate indications of the important biological consequences of genome dynamism are accumulating rapidly, we review the evidence that structural variation has an appreciable impact on cellular phenotypes, disease and human evolution.
Funded by: Wellcome Trust: 077009, 077014, 077046
Trends in genetics : TIG 2008;24;5;238-45
A novel CpG island set identifies tissue-specific methylation at developmental gene loci.
Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom.
CpG islands (CGIs) are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%-8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.
Funded by: Wellcome Trust
PLoS biology 2008;6;1;e22
The genome-wide patterns of variation expose significant substructure in a founder population.
Department of Molecular Medicine, National Public Health Institute and Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland.
Although high-density SNP genotyping platforms generate a momentum for detailed genome-wide association (GWA) studies, an offshoot is a new insight into population genetics. Here, we present an example in one of the best-known founder populations by scrutinizing ten distinct Finnish early- and late-settlement subpopulations. By determining genetic distances, homozygosity, and patterns of linkage disequilibrium, we demonstrate that population substructure, and even individual ancestry, is detectable at a very high resolution and supports the concept of multiple historical bottlenecks resulting from consecutive founder effects. Given that genetic studies are currently aiming at identifying smaller and smaller genetic effects, recognizing and controlling for population substructure even at this fine level becomes imperative to avoid confounding and spurious associations. This study provides an example of the power of GWA data sets to demonstrate stratification caused by population history even within a seemingly homogeneous population, like the Finns. Further, the results provide interesting lessons concerning the impact of population history on the genome landscape of humans, as well as approaches to identify rare variants enriched in these subpopulations.
Funded by: NCRR NIH HHS: U54 RR020278, U54RR020278; NHLBI NIH HHS: 1R01HL087679-01, R01 HL087679
American journal of human genetics 2008;83;6;787-94
A systematic library for comprehensive overexpression screens in Saccharomyces cerevisiae.
Department of Molecular Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA.
Modern genetic analysis requires the development of new resources to systematically explore gene function in vivo. Overexpression screens are a powerful method to investigate genetic pathways, but the goal of routine and comprehensive overexpression screens has been hampered by the lack of systematic libraries. Here we describe the construction of a systematic collection of the Saccharomyces cerevisiae genome in a high-copy vector and its validation in two overexpression screens.
Funded by: NIGMS NIH HHS: GM52486; Wellcome Trust
Nature methods 2008;5;3;239-41
Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans.
Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts 02114, USA. firstname.lastname@example.org
Blood concentrations of lipoproteins and lipids are heritable risk factors for cardiovascular disease. Using genome-wide association data from three studies (n = 8,816 that included 2,758 individuals from the Diabetes Genetics Initiative specific to the current paper as well as 1,874 individuals from the FUSION study of type 2 diabetes and 4,184 individuals from the SardiNIA study of aging-associated variables reported in a companion paper in this issue) and targeted replication association analyses in up to 18,554 independent participants, we show that common SNPs at 18 loci are reproducibly associated with concentrations of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and/or triglycerides. Six of these loci are new (P < 5 x 10(-8) for each new locus). Of the six newly identified chromosomal regions, two were associated with LDL cholesterol (1p13 near CELSR2, PSRC1 and SORT1 and 19p13 near CILP2 and PBX4), one with HDL cholesterol (1q42 in GALNT2) and five with triglycerides (7q11 near TBL2 and MLXIPL, 8q24 near TRIB1, 1q42 in GALNT2, 19p13 near CILP2 and PBX4 and 1p31 near ANGPTL3). At 1p13, the LDL-associated SNP was also strongly correlated with CELSR2, PSRC1, and SORT1 transcript levels in human liver, and a proxy for this SNP was recently shown to affect risk for coronary artery disease. Understanding the molecular, cellular and clinical consequences of the newly identified loci may inform therapy and clinical care.
Funded by: Wellcome Trust: 089061
Nature genetics 2008;40;2;189-97
iMapper: a web application for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes.
Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambs CB101HH, UK.
Summary: Insertional mutagenesis is a powerful method for gene discovery. To identify the location of insertion sites in the genome linker based polymerase chain reaction (PCR) methods (such as splinkerette-PCR) may be employed. We have developed a web application called iMapper (Insertional Mutagenesis Mapping and Analysis Tool) for the efficient analysis of insertion site sequence reads against vertebrate and invertebrate Ensembl genomes. Taking linker based sequences as input, iMapper scans and trims the sequence to remove the linker and sequences derived from the insertional mutagen. The software then identifies and removes contaminating sequences derived from chimeric genomic fragments, vector or the transposon concatamer and then presents the clipped sequence reads to a sequence mapping server which aligns them to an Ensembl genome. Insertion sites can then be navigated in Ensembl in the context of genomic features such as gene structures. iMapper also generates test-based format for nucleic acid or protein sequences (FASTA) and generic file format (GFF) files of the clipped sequence reads and provides a graphical overview of the mapped insertion sites against a karyotype. iMapper is designed for high-throughput applications and can efficiently process thousands of DNA sequence reads.
Availability: iMapper is web based and can be accessed at http://www.sanger.ac.uk/cgi-bin/teams/team113/imapper.cgi.
Funded by: Cancer Research UK: C20510/A6997; Wellcome Trust: 76943
Bioinformatics (Oxford, England) 2008;24;24;2923-5
Normal germ line establishment in mice carrying a deletion of the Ifitm/Fragilis gene family cluster.
Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer & Developmental Biology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, United Kingdom.
The family of interferon-inducible transmembrane proteins (Ifitm) consists of five highly sequence-related cell surface proteins, which are implicated in diverse cellular processes. Ifitm genes are conserved, widely expressed, and characteristically found in genomic clusters, such as the 67-kb Ifitm family locus on mouse chromosome 7. Recently, Ifitm1 and Ifitm3 have been suggested to mediate migration of early primordial germ cells (PGCs), a process that is little understood. To investigate Ifitm function during germ cell development, we used targeted chromosome engineering to generate mutants which either lack the entire Ifitm locus or carry a disrupted Ifitm3 gene only. Here we show that the mutations have no detectable effects on development of the germ line or on the generation of live young. Hence, contrary to previous reports, Ifitm genes are not essential for PGC migration. The Ifitm family is a striking example of a conserved gene cluster which appears to be functionally redundant during development.
Funded by: Biotechnology and Biological Sciences Research Council; Cancer Research UK; Wellcome Trust: 065601
Molecular and cellular biology 2008;28;15;4688-96
Host transmission of Salmonella enterica serovar Typhimurium is controlled by virulence factors and indigenous intestinal microbiota.
Department of Microbiology and Immunology, 299 Campus Drive, Stanford University, Stanford, CA 94305, USA.
Transmission is an essential stage of a pathogen's life cycle and remains poorly understood. We describe here a model in which persistently infected 129X1/SvJ mice provide a natural model of Salmonella enterica serovar Typhimurium transmission. In this model only a subset of the infected mice, termed supershedders, shed high levels (>10(8) CFU/g) of Salmonella serovar Typhimurium in their feces and, as a result, rapidly transmit infection. While most Salmonella serovar Typhimurium-infected mice show signs of intestinal inflammation, only supershedder mice develop colitis. Development of the supershedder phenotype depends on the virulence determinants Salmonella pathogenicity islands 1 and 2, and it is characterized by mucosal invasion and, importantly, high luminal abundance of Salmonella serovar Typhimurium within the colon. Immunosuppression of infected mice does not induce the supershedder phenotype, demonstrating that the immune response is not the main determinant of Salmonella serovar Typhimurium levels within the colon. In contrast, treatment of mice with antibiotics that alter the health-associated indigenous intestinal microbiota rapidly induces the supershedder phenotype in infected mice and predisposes uninfected mice to the supershedder phenotype for several days. These results demonstrate that the intestinal microbiota plays a critical role in controlling Salmonella serovar Typhimurium infection, disease, and transmissibility. This novel model should facilitate the study of host, pathogen, and intestinal microbiota factors that contribute to infectious disease transmission.
Funded by: NIAID NIH HHS: AI26195, R01 AI026195
Infection and immunity 2008;76;1;403-16
A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans.
Center for Systems and Synthetic Biology, Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas, 2500 Speedway, MBB 3.210, Austin, Texas 78712, USA.
The fundamental aim of genetics is to understand how an organism's phenotype is determined by its genotype, and implicit in this is predicting how changes in DNA sequence alter phenotypes. A single network covering all the genes of an organism might guide such predictions down to the level of individual cells and tissues. To validate this approach, we computationally generated a network covering most C. elegans genes and tested its predictive capacity. Connectivity within this network predicts essentiality, identifying this relationship as an evolutionarily conserved biological principle. Critically, the network makes tissue-specific predictions-we accurately identify genes for most systematically assayed loss-of-function phenotypes, which span diverse cellular and developmental processes. Using the network, we identify 16 genes whose inactivation suppresses defects in the retinoblastoma tumor suppressor pathway, and we successfully predict that the dystrophin complex modulates EGF signaling. We conclude that an analogous network for human genes might be similarly predictive and thus facilitate identification of disease genes and rational therapeutic targets.
Funded by: NIGMS NIH HHS: GM06779-01; Wellcome Trust
Nature genetics 2008;40;2;181-8
Evolution of the Rhodococcus equi vap pathogenicity island seen through comparison of host-associated vapA and vapB virulence plasmids.
Division of Microbial Pathogenesis, Centre for Infectious Diseases, Ashworth Laboratories, King's Buildings, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
The pathogenic actinomycete Rhodococcus equi harbors different types of virulence plasmids associated with specific nonhuman hosts. We determined the complete DNA sequence of a vapB(+) plasmid, typically associated with pig isolates, and compared it with that of the horse-specific vapA(+) plasmid type. pVAPB1593, a circular 79,251-bp element, had the same housekeeping backbone as the vapA(+) plasmid but differed over an approximately 22-kb region. This variable region encompassed the vap pathogenicity island (PAI), was clearly subject to selective pressures different from those affecting the backbone, and showed major genetic rearrangements involving the vap genes. The pVAPB1593 PAI harbored five different vap genes (vapB and vapJ to -M, with vapK present in two copies), which encoded products differing by 24 to 84% in amino acid sequence from the six full-length vapA(+) plasmid-encoded Vap proteins, consistent with a role for the specific vap gene complement in R. equi host tropism. Sequence analyses, including interpolated variable-order motifs for detection of alien DNA and reconstruction of Vap family phylogenetic relationships, suggested that the vap PAI was acquired by an ancestor plasmid via lateral gene transfer, subsequently evolving by vap gene duplication and sequence diversification to give different (host-adapted) plasmids. The R. equi virulence plasmids belong to a new family of actinobacterial circular replicons characterized by an ancient conjugative backbone and a horizontally acquired niche-adaptive plasticity region.
Journal of bacteriology 2008;190;17;5797-805
Identification of ten loci associated with height highlights new biological pathways in human growth.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Height is a classic polygenic trait, reflecting the combined influence of multiple as-yet-undiscovered genetic factors. We carried out a meta-analysis of genome-wide association study data of height from 15,821 individuals at 2.2 million SNPs, and followed up the strongest findings in >10,000 subjects. Ten newly identified and two previously reported loci were strongly associated with variation in height (P values from 4 x 10(-7) to 8 x 10(-22)). Together, these 12 loci account for approximately 2% of the population variation in height. Individuals with < or =8 height-increasing alleles and > or =16 height-increasing alleles differ in height by approximately 3.5 cm. The newly identified loci, along with several additional loci with strongly suggestive associations, encompass both strong biological candidates and unexpected genes, and highlight several pathways (let-7 targets, chromatin remodeling proteins and Hedgehog signaling) as important regulators of human stature. These results expand the picture of the biological regulation of human height and of the genetic architecture of this classical complex trait.
Funded by: Intramural NIH HHS; NCI NIH HHS: 5P01CA087969, 5U01CA098233, CA49449; NHGRI NIH HHS: HG02651; NHLBI NIH HHS: HL084729; NIDDK NIH HHS: 5 R01 DK 075787, DK62370, DK72193, R01 DK072193; Wellcome Trust: 089061
Nature genetics 2008;40;5;584-91
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom.
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.
Funded by: Wellcome Trust
Genome research 2008;18;11;1851-8
Lifelong reduction of LDL-cholesterol related to a common variant in the LDL-receptor gene decreases the risk of coronary artery disease--a Mendelian Randomisation study.
Medizinische Klinik II, Universität zu Lübeck, Lübeck, Germany.
Background: Rare mutations of the low-density lipoprotein receptor gene (LDLR) cause familial hypercholesterolemia, which increases the risk for coronary artery disease (CAD). Less is known about the implications of common genetic variation in the LDLR gene regarding the variability of cholesterol levels and risk of CAD.
Methods: Imputed genotype data at the LDLR locus on 1 644 individuals of a population-based sample were explored for association with LDL-C level. Replication of association with LDL-C level was sought for the most significant single nucleotide polymorphism (SNP) within the LDLR gene in three European samples comprising 6 642 adults and 533 children. Association of this SNP with CAD was examined in six case-control studies involving more than 15 000 individuals.
Findings: Each copy of the minor T allele of SNP rs2228671 within LDLR (frequency 11%) was related to a decrease of LDL-C levels by 0.19 mmol/L (95% confidence interval (CI) [0.13-0.24] mmol/L, p = 1.5x10(-10)). This association with LDL-C was uniformly found in children, men, and women of all samples studied. In parallel, the T allele of rs2228671 was associated with a significantly lower risk of CAD (Odds Ratio per copy of the T allele: 0.82, 95% CI [0.76-0.89], p = 2.1x10(-7)). Adjustment for LDL-C levels by logistic regression or Mendelian Randomisation models abolished the significant association between rs2228671 with CAD completely, indicating a functional link between the genetic variant at the LDLR gene locus, change in LDL-C and risk of CAD.
Conclusion: A common variant at the LDLR gene locus affects LDL-C levels and, thereby, the risk for CAD.
Funded by: British Heart Foundation; Medical Research Council
PloS one 2008;3;8;e2986
The conserved plant sterility gene HAP2 functions after attachment of fusogenic membranes in Chlamydomonas and Plasmodium gametes.
Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
The cellular and molecular mechanisms that underlie species-specific membrane fusion between male and female gametes remain largely unknown. Here, by use of gene discovery methods in the green alga Chlamydomonas, gene disruption in the rodent malaria parasite Plasmodium berghei, and distinctive features of fertilization in both organisms, we report discovery of a mechanism that accounts for a conserved protein required for gamete fusion. A screen for fusion mutants in Chlamydomonas identified a homolog of HAP2, an Arabidopsis sterility gene. Moreover, HAP2 disruption in Plasmodium blocked fertilization and thereby mosquito transmission of malaria. HAP2 localizes at the fusion site of Chlamydomonas minus gametes, yet Chlamydomonas minus and Plasmodium hap2 male gametes retain the ability, using other, species-limited proteins, to form tight prefusion membrane attachments with their respective gamete partners. Membrane dye experiments show that HAP2 is essential for membrane merger. Thus, in two distantly related eukaryotes, species-limited proteins govern access to a conserved protein essential for membrane fusion.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council: G0501670; NIGMS NIH HHS: GM056778, R01 GM056778; Wellcome Trust
Genes & development 2008;22;8;1051-68
Common variants near MC4R are associated with fat mass, weight and risk of obesity.
MRC Epidemiology Unit, Addenbrooke's Hospital, Cambridge CB2 0QQ, UK.
To identify common variants influencing body mass index (BMI), we analyzed genome-wide association data from 16,876 individuals of European descent. After previously reported variants in FTO, the strongest association signal (rs17782313, P = 2.9 x 10(-6)) mapped 188 kb downstream of MC4R (melanocortin-4 receptor), mutations of which are the leading cause of monogenic severe childhood-onset obesity. We confirmed the BMI association in 60,352 adults (per-allele effect = 0.05 Z-score units; P = 2.8 x 10(-15)) and 5,988 children aged 7-11 (0.13 Z-score units; P = 1.5 x 10(-8)). In case-control analyses (n = 10,583), the odds for severe childhood obesity reached 1.30 (P = 8.0 x 10(-11)). Furthermore, we observed overtransmission of the risk allele to obese offspring in 660 families (P (pedigree disequilibrium test average; PDT-avg) = 2.4 x 10(-4)). The SNP location and patterns of phenotypic associations are consistent with effects mediated through altered MC4R function. Our findings establish that common variants near MC4R influence fat mass, weight and obesity risk at the population level and reinforce the need for large-scale data integration to identify variants influencing continuous biomedical traits.
Funded by: British Heart Foundation; Cancer Research UK; Department of Health: DHCS/07/07/008; Medical Research Council: G0000934, G0000934(68341), G0400874, G0401527, G0600331, G0600705, G0601261, G0701863, G9521010, G9521010(63660), G9824984, G9828345, MC_QA137934, MC_U105161047, MC_U105630924, MC_U106179472, MC_U106188470, MC_U147585824, MC_UP_A620_1014, U.1475.00.002.00001.01 (85824); NIDDK NIH HHS: F32 DK079466, F32 DK079466-01, K23 DK080145, K23 DK080145-01, P30 DK040561, P30 DK040561-13, R01 DK072193; Wellcome Trust: 068545, 076113, 077016, 079557, 084713, 090532
Nature genetics 2008;40;6;768-75
Sites of strong Rec12/Spo11 binding in the fission yeast genome are associated with meiotic recombination and with centromeres.
Institute of Cell Biology, University of Bern, 3012 Bern, Switzerland. email@example.com
Meiotic recombination arises from Rec12/Spo11-dependent formation of DNA double-strand breaks (DSBs) and their subsequent repair. We identified Rec12-binding peaks across the Schizosaccharomyces pombe genome using chromatin immunoprecipitation after reversible formaldehyde cross-linking combined with whole-genome DNA microarrays. Strong Rec12 binding coincided with previously identified DSBs at the recombination hotspots ura4A, mbs1, and mbs2 and correlated with DSB formation at a new site. In addition, Rec12 binding corresponded to eight novel conversion hotspots and correlated with crossover density in segments of chromosome I. Notably, Rec12 binding inversely correlated with guanine-cytosine (GC) content, contrary to findings in Saccharomyces cerevisiae. Although both replication origins and Rec12-binding sites preferred AT-rich gene-free regions, they seemed to exclude each other. We also uncovered a connection between binding sites of Rec12 and meiotic cohesin Rec8. Rec12-binding peaks lay often within 2.5 kb of a Rec8-binding peak. Rec12 binding showed preference for large intergenic regions and was found to bind preferentially near to genes expressed strongly in meiosis. Surprisingly, Rec12 binding was also detected in centromeric core regions, which raises the intriguing possibility that Rec12 plays additional roles in meiotic chromosome dynamics.
Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118
MYO5B mutations cause microvillus inclusion disease and disrupt epithelial cell polarity.
Department of Pediatrics II, Innsbruck Medical University, 6020 Innsbruck, Austria.
Following homozygosity mapping in a single kindred, we identified nonsense and missense mutations in MYO5B, encoding type Vb myosin motor protein, in individuals with microvillus inclusion disease (MVID). MVID is characterized by lack of microvilli on the surface of enterocytes and occurrence of intracellular vacuolar structures containing microvilli. In addition, mislocalization of transferrin receptor in MVID enterocytes suggests that MYO5B deficiency causes defective trafficking of apical and basolateral proteins in MVID.
Funded by: Austrian Science Fund FWF: P 19486
Nature genetics 2008;40;10;1163-5
The neglected role of antibody in protection against bacteremia caused by nontyphoidal strains of Salmonella in African children.
Malawi-Liverpool-Wellcome Trust Clinical Research Programme, College of Medicine, University of Malawi, Blantyre, Malawi. firstname.lastname@example.org
Nontyphoidal strains of Salmonella (NTS) are a common cause of bacteremia among African children. Cell-mediated immune responses control intracellular infection, but they do not protect against extracellular growth of NTS in the blood. We investigated whether antibody protects against NTS bacteremia in Malawian children, because we found this condition mainly occurs before 2 years of age, with relative sparing of infants younger than 4 months old. Sera from all healthy Malawian children tested aged more than 16 months contained anti-Salmonella antibody and successfully killed NTS. Killing was mediated by complement membrane attack complex and not augmented in the presence of blood leukocytes. Sera from most healthy children less than 16 months old lacked NTS-specific antibody, and sera lacking antibody did not kill NTS despite normal complement function. Addition of Salmonella-specific antibody, but not mannose-binding lectin, enabled NTS killing. All NTS strains tested had long-chain lipopolysaccharide and the rck gene, features that resist direct complement-mediated killing. Disruption of lipopolysaccharide biosynthesis enabled killing of NTS by serum lacking Salmonella-specific antibody. We conclude that Salmonella-specific antibody that overcomes the complement resistance of NTS develops by 2 years of life in Malawian children. This finding and the age-incidence of NTS bacteremia suggest that antibody protects against NTS bacteremia and support the development of vaccines against NTS that induce protective antibody.
Funded by: Wellcome Trust: 067902
The Journal of clinical investigation 2008;118;4;1553-62
Microbiology in the post-genomic era.
Novartis Vaccines and Diagnostics, 53100 Siena, Italy.
Genomics has revolutionized every aspect of microbiology. Now, 13 years after the first bacterial genome was sequenced, it is important to pause and consider what has changed in microbiology research as a consequence of genomics. In this article, we review the evolving field of bacterial typing and the genomic technologies that enable comparative analysis of multiple genomes and the metagenomes of complex microbial environments, and address the implications of the genomic era for the future of microbiology.
Nature reviews. Microbiology 2008;6;6;419-30
Genomic expression patterns in cell separation mutants of Schizosaccharomyces pombe defective in the genes sep10 ( + ) and sep15 ( + ) coding for the Mediator subunits Med31 and Med8.
Department of Genetics and Applied Microbiology, University of Debrecen, Debrecen, Hungary.
Cell division is controlled by a complex network involving regulated transcription of genes and postranslational modification of proteins. The aim of this study is to demonstrate that the Mediator complex, a general regulator of transcription, is involved in the regulation of the second phase (cell separation) of cell division of the fission yeast Schizosaccharomyces pombe. In previous studies we have found that the fission yeast cell separation genes sep10 ( + ) and sep15 ( + ) code for proteins (Med31 and Med8) associated with the Mediator complex. Here, we show by genome-wide gene expression profiling of mutants defective in these genes that both Med8 and Med31 control large, partially overlapping sets of genes scattered over the entire genome and involved in diverse biological functions. Six cell separation genes controlled by the transcription factors Sep1 and Ace2 are among the target genes. Since neither sep1 ( + ) nor ace2 ( + ) is affected in the mutant cells, we propose that the Med8 and Med31 proteins act as coactivators of the Sep1-Ace2-dependent cell separation genes. The results also indicate that the subunits of Mediator may contribute to the coordination of cellular processes by fine-tuning of the expression of larger sets of genes.
Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118
Molecular genetics and genomics : MGG 2008;279;3;225-38
Fission yeast SWI/SNF and RSC complexes show compositional and functional differences from budding yeast.
Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
SWI/SNF chromatin-remodeling complexes have crucial roles in transcription and other chromatin-related processes. The analysis of the two members of this class in Saccharomyces cerevisiae, SWI/SNF and RSC, has heavily contributed to our understanding of these complexes. To understand the in vivo functions of SWI/SNF and RSC in an evolutionarily distant organism, we have characterized these complexes in Schizosaccharomyces pombe. Although core components are conserved between the two yeasts, the compositions of S. pombe SWI/SNF and RSC differ from their S. cerevisiae counterparts and in some ways are more similar to metazoan complexes. Furthermore, several of the conserved proteins, including actin-like proteins, are markedly different between the two yeasts with respect to their requirement for viability. Finally, phenotypic and microarray analyses identified widespread requirements for SWI/SNF and RSC on transcription including strong evidence that SWI/SNF directly represses iron-transport genes.
Funded by: Cancer Research UK: A6517, C9546/A6517; NHGRI NIH HHS: HG3456, R01 HG003456, R01 HG003456-03; NIGMS NIH HHS: GM32967, R37 GM032967, R37 GM032967-25; Wellcome Trust: 077118
Nature structural & molecular biology 2008;15;8;873-80
Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.
Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark.
We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.
Funded by: Wellcome Trust
Genome research 2008;18;2;281-92
Dynamic instability of the major urinary protein gene family revealed by genomic and phenotypic comparisons between C57 and 129 strain mice.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB101SA, UK. email@example.com
Background: The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution.
Results: We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified--including a gene/pseudogene pair--suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes.
Conclusion: Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.
Funded by: Biotechnology and Biological Sciences Research Council: S19816; NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198
Genome biology 2008;9;5;R91
Organ-specific requirements for Hdac1 in liver and pancreas formation.
National Institute for Medical Research, Division of Developmental Biology, The Ridgeway, Mill Hill, London, UK.
Liver, pancreas and lung originate from the presumptive foregut in temporal and spatial proximity. This requires precisely orchestrated transcriptional activation and repression of organ-specific gene expression within the same cell. Here, we show distinct roles for the chromatin remodelling factor and transcriptional repressor Histone deacetylase 1 (Hdac1) in endodermal organogenesis in zebrafish. Loss of Hdac1 causes defects in timely liver specification and in subsequent differentiation. Mosaic analyses reveal a cell-autonomous requirement for hdac1 within the hepatic endoderm. Our studies further reveal specific functions for Hdac1 in pancreas development. Loss of hdac1 causes the formation of ectopic endocrine clusters anteriorly to the main islet, as well as defects in exocrine pancreas specification and differentiation. In addition, we observe defects in extrahepatopancreatic duct formation and morphogenesis. Finally, loss of hdac1 results in an expansion of the foregut endoderm in the domain from which the liver and pancreas originate. Our genetic studies demonstrate that Hdac1 is crucial for regulating distinct steps in endodermal organogenesis. This suggests a model in which Hdac1 may directly or indirectly restrict foregut fates while promoting hepatic and exocrine pancreatic specification and differentiation, as well as pancreatic endocrine islet morphogenesis. These findings establish zebrafish as a tractable system to investigate chromatin remodelling factor functions in controlling gene expression programmes in vertebrate endodermal organogenesis.
Funded by: Medical Research Council: MC_U117581329
Developmental biology 2008;322;2;237-50
Mutations in mRNA export mediator GLE1 result in a fetal motoneuron disease.
Department of Molecular Medicine, National Public Health Institute, Helsinki 00290, Finland.
The most severe forms of motoneuron disease manifest in utero are characterized by marked atrophy of spinal cord motoneurons and fetal immobility. Here, we report that the defective gene underlying lethal motoneuron syndrome LCCS1 is the mRNA export mediator GLE1. Our finding of mutated GLE1 exposes a common pathway connecting the genes implicated in LCCS1, LCCS2 and LCCS3 and elucidates mRNA processing as a critical molecular mechanism in motoneuron development and maturation.
Funded by: NIEHS NIH HHS: P01 ES11253-03; Wellcome Trust: 089061
Nature genetics 2008;40;2;155-7
Post-genomic challenges for collaborative research in infectious diseases.
Department of Biology, Haverford College, Haverford, Pennsylvania 19041, USA. firstname.lastname@example.org
Although high-burden pathogens have been prioritized for sequencing, genomic research has yet to yield effective vaccines, diagnostics or therapeutics for the infectious diseases that burden developing countries. International research partnerships are needed more today than ever before, and we propose that increased participation by scientists in endemic areas would overcome current roadblocks and is an essential path towards translational research outcomes.
Funded by: Wellcome Trust
Nature reviews. Microbiology 2008;6;11;858-64
Genomic adaptation: a fungal perspective.
Nature reviews. Microbiology 2008;6;8;572-3
The genome of the simian and human malaria parasite Plasmodium knowlesi.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. email@example.com
Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.
Funded by: Wellcome Trust: 085775
Complete genome sequence of uropathogenic Proteus mirabilis, a master of both adherence and motility.
Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109-0620, USA.
The gram-negative enteric bacterium Proteus mirabilis is a frequent cause of urinary tract infections in individuals with long-term indwelling catheters or with complicated urinary tracts (e.g., due to spinal cord injury or anatomic abnormality). P. mirabilis bacteriuria may lead to acute pyelonephritis, fever, and bacteremia. Most notoriously, this pathogen uses urease to catalyze the formation of kidney and bladder stones or to encrust or obstruct indwelling urinary catheters. Here we report the complete genome sequence of P. mirabilis HI4320, a representative strain cultured in our laboratory from the urine of a nursing home patient with a long-term (> or =30 days) indwelling urinary catheter. The genome is 4.063 Mb long and has a G+C content of 38.88%. There is a single plasmid consisting of 36,289 nucleotides. Annotation of the genome identified 3,685 coding sequences and seven rRNA loci. Analysis of the sequence confirmed the presence of previously identified virulence determinants, as well as a contiguous 54-kb flagellar regulon and 17 types of fimbriae. Genes encoding a potential type III secretion system were identified on a low-G+C-content genomic island containing 24 intact genes that appear to encode all components necessary to assemble a type III secretion system needle complex. In addition, the P. mirabilis HI4320 genome possesses four tandem copies of the zapE metalloprotease gene, genes encoding six putative autotransporters, an extension of the atf fimbrial operon to six genes, including an mrpJ homolog, and genes encoding at least five iron uptake mechanisms, two potential type IV secretion systems, and 16 two-component regulators.
Funded by: NIAID NIH HHS: AI059722, F32 AI068324, F32 AI068324-01A2, R01 AI059722, T32 AI007528, T32 AI7528; Wellcome Trust
Journal of bacteriology 2008;190;11;4027-37
Copy number variation and evolution in humans and chimpanzees.
School of Human Evolution & Social Change, Arizona State University, Tempe, Arizona 85287, USA.
Copy number variants (CNVs) underlie many aspects of human phenotypic diversity and provide the raw material for gene duplication and gene family expansion. However, our understanding of their evolutionary significance remains limited. We performed comparative genomic hybridization on a single human microarray platform to identify CNVs among the genomes of 30 humans and 30 chimpanzees as well as fixed copy number differences between species. We found that human and chimpanzee CNVs occur in orthologous genomic regions far more often than expected by chance and are strongly associated with the presence of highly homologous intrachromosomal segmental duplications. By adapting population genetic analyses for use with copy number data, we identified functional categories of genes that have likely evolved under purifying or positive selection for copy number changes. In particular, duplications and deletions of genes with inflammatory response and cell proliferation functions may have been fixed by positive selection and involved in the adaptive phenotypic differentiation of humans and chimpanzees.
Funded by: Howard Hughes Medical Institute; NCRR NIH HHS: C06 RR014491, C06 RR016483, RR014491, RR015087, RR016483, U42 RR015087; NHGRI NIH HHS: HG004221, P41 HG004221; Wellcome Trust
Genome research 2008;18;11;1698-710
Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.
Funded by: Wellcome Trust
Journal of bacteriology 2008;190;7;2580-7
BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals.
Max Planck Institute for Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, D-01307 Dresden, Germany.
The interpretation of genome sequences requires reliable and standardized methods to assess protein function at high throughput. Here we describe a fast and reliable pipeline to study protein function in mammalian cells based on protein tagging in bacterial artificial chromosomes (BACs). The large size of the BAC transgenes ensures the presence of most, if not all, regulatory elements and results in expression that closely matches that of the endogenous gene. We show that BAC transgenes can be rapidly and reliably generated using 96-well-format recombineering. After stable transfection of these transgenes into human tissue culture cells or mouse embryonic stem cells, the localization, protein-protein and/or protein-DNA interactions of the tagged protein are studied using generic, tag-based assays. The same high-throughput approach will be generally applicable to other model systems.
Funded by: NHGRI NIH HHS: 1R01HG004428-01; Wellcome Trust: 077192
Nature methods 2008;5;5;409-15
Mosaic complementation demonstrates a regulatory role for myosin VIIa in actin dynamics of stereocilia.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. email@example.com
We have developed a bacterial artificial chromosome transgenesis approach that allowed the expression of myosin VIIa from the mouse X chromosome. We demonstrated the complementation of the Myo7a null mutant phenotype producing a fine mosaic of two types of sensory hair cells within inner ear epithelia of hemizygous transgenic females due to X inactivation. Direct comparisons between neighboring auditory hair cells that were different only with respect to myosin VIIa expression revealed that mutant stereocilia are significantly longer than those of their complemented counterparts. Myosin VIIa-deficient hair cells showed an abnormally persistent tip localization of whirlin, a protein directly linked to elongation of stereocilia, in stereocilia. Furthermore, myosin VIIa localized at the tips of all abnormally short stereocilia of mice deficient for either myosin XVa or whirlin. Our results strongly suggest that myosin VIIa regulates the establishment of a setpoint for stereocilium heights, and this novel role may influence their normal staircase-like arrangement within a bundle.
Funded by: Medical Research Council: G0300212, MC_QA137918; Wellcome Trust
Molecular and cellular biology 2008;28;5;1702-12
A large genome center's improvements to the Illumina sequencing system.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
The Wellcome Trust Sanger Institute is one of the world's largest genome centers, and a substantial amount of our sequencing is performed with 'next-generation' massively parallel sequencing technologies: in June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases. Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.
Funded by: Medical Research Council: G0701805; Wellcome Trust: 079643
Nature methods 2008;5;12;1005-10
An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs).
Institute of Cell and Molecular Science, Barts and the London, London E1 2AT, United Kingdom. firstname.lastname@example.org
We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA methylation reference profiles of 13 normal somatic tissues, placenta, sperm, and an immortalized cell line, a visualization tool that has been integrated with the Ensembl genome browser and a new algorithm for the analysis of immunoprecipitation-based DNA methylation profiles. We demonstrate the utility of our resource by identifying the first comprehensive genome-wide set of tissue-specific differentially methylated regions (tDMRs) that may play a role in cellular identity and the regulation of tissue-specific genome function. We also discuss the implications of our findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications.
Funded by: Cancer Research UK: C14303/A4646; Wellcome Trust: 077198
Genome research 2008;18;9;1518-29
MEROPS: the peptidase database.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. email@example.com
Peptidases (proteolytic enzymes or proteases), their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http://merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The organizational principle of the database is a hierarchical classification in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families and in turn grouped into clans. Important additions to the database include newly written, concise text annotations for peptidase clans and the small molecule inhibitors that are outside the scope of the standard classification; displays to show peptidase specificity compiled from our collection of known substrate cleavages; tables of peptidase-inhibitor interactions; and dynamically generated alignments of representatives of each protein species at the family level. New ways to compare peptidase and inhibitor complements between any two organisms whose genomes have been completely sequenced, or between different strains or subspecies of the same organism, have been devised.
Funded by: Wellcome Trust
Nucleic acids research 2008;36;Database issue;D320-5
The Protein Feature Ontology: a tool for the unification of protein feature annotations.
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. firstname.lastname@example.org
Motivation: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language.
Results: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe.
Availability: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS).
Funded by: Wellcome Trust: 062023, 077198
Bioinformatics (Oxford, England) 2008;24;23;2767-72
Fission yeast MAP kinase Sty1 is recruited to stress-induced genes.
Paterson Institute for Cancer Research, University of Manchester, Wilmslow Road, Manchester, UK.
The stress-induced expression of many fission yeast genes is dependent upon the Sty1 mitogen-activated protein kinase (MAPK) and Atf1 transcription factor. Atf1 is phosphorylated by Sty1 yet this phosphorylation is not required for stress-induced gene expression, suggesting another mechanism exists whereby Sty1 activates transcription. Here we show that Sty1 associates with Atf1-dependent genes and is recruited to both their promoters and coding regions. This occurs in response to various stress conditions coincident with the kinetics of the activation of Sty1. Association with promoters is not a consequence of increased nuclear accumulation of Sty1 nor does it require the phosphorylation of Atf1. However, recruitment is completely abolished in a mutant lacking Sty1 kinase activity. Both Atf1 and its binding partner Pcr1 are required for association of Sty1 with Atf1-dependent promoters, suggesting that this heterodimer must be intact for optimal recruitment of the MAPK. However, many Atf1-dependent genes are still expressed in a pcr1Delta mutant but with significantly delayed kinetics, thus providing an explanation for the relatively mild stress sensitivity displayed by pcr1Delta. Consistent with this delay, Sty1 and Atf1 cannot be detected at these promoters in this condition, suggesting that their association with chromatin is weak or transient in the absence of Pcr1.
Funded by: Cancer Research UK: A6517; Wellcome Trust: 077118, 098051
The Journal of biological chemistry 2008;283;15;9945-56
Use and misuse of the gene ontology annotations.
Carnegie Institution for Science, Department of Plant Biology, 260 Panama Street, Stanford, California 94305, USA. email@example.com
The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.
Nature reviews. Genetics 2008;9;7;509-15
Male-pattern baldness susceptibility locus at 20p11.
Department of Twin Research and Genetic Epidemiology, King's College London, London SE1 7EH, UK.
We conducted a genome-wide association study for androgenic alopecia in 1,125 men and identified a newly associated locus at chromosome 20p11.22, confirmed in three independent cohorts (n = 1,650; OR = 1.60, P = 1.1 x 10(-14) for rs1160312). The one man in seven who harbors risk alleles at both 20p11.22 and AR (encoding the androgen receptor) has a sevenfold-increased odds of androgenic alopecia (OR = 7.12, P = 3.7 x 10(-15)).
Funded by: Wellcome Trust: 077011
Nature genetics 2008;40;11;1282-4
Maximum-likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification.
Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv, Israel. firstname.lastname@example.org
The mitochondrial DNA hypervariable segment I (HVS-I) is widely used in studies of human evolutionary genetics, and therefore accurate estimates of mutation rates among nucleotide sites in this region are essential. We have developed a novel maximum-likelihood methodology for estimating site-specific mutation rates from partial phylogenetic information, such as haplogroup association. The resulting estimation problem is a generalized linear model, with a nonstandard link function. We develop inference and bias correction tools for our estimates and a hypothesis-testing approach for site independence. We demonstrate our methodology using 16,609 HVS-I samples from the Genographic Project. Our results suggest that mutation rates among nucleotide sites in HVS-I are highly variable. The 16,400-16,500 region exhibits significantly lower rates compared to other regions, suggesting potential functional constraints. Several loci identified in the literature as possible termination-associated sequences (TAS) do not yield statistically slower rates than the rest of HVS-I, casting doubt on their functional importance. Our tests do not reject the null hypothesis of independent mutation rates among nucleotide sites, supporting the use of site-independence assumption for analyzing HVS-I. Potential extensions of our methodology include its application to estimation of mutation rates in other genetic regions, like Y chromosome short tandem repeats.
Funded by: Wellcome Trust
TreeFam: 2008 Update.
Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China.
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14,351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
Funded by: Wellcome Trust
Nucleic acids research 2008;36;Database issue;D735-40
Pfam 10 years on: 10,000 families and still growing.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Funded by: Wellcome Trust: 087656
Briefings in bioinformatics 2008;9;3;210-9
Mendelian randomisation studies of type 2 diabetes: future prospects.
MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK. email@example.com
Funded by: Medical Research Council: MC_U106179471, MC_U106188470; Wellcome Trust: 077016
LDL-cholesterol concentrations: a genome-wide association study.
Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, Cambridge, UK. firstname.lastname@example.org
Background: LDL cholesterol has a causal role in the development of cardiovascular disease. Improved understanding of the biological mechanisms that underlie the metabolism and regulation of LDL cholesterol might help to identify novel therapeutic targets. We therefore did a genome-wide association study of LDL-cholesterol concentrations.
Methods: We used genome-wide association data from up to 11,685 participants with measures of circulating LDL-cholesterol concentrations across five studies, including data for 293 461 autosomal single nucleotide polymorphisms (SNPs) with a minor allele frequency of 5% or more that passed our quality control criteria. We also used data from a second genome-wide array in up to 4337 participants from three of these five studies, with data for 290,140 SNPs. We did replication studies in two independent populations consisting of up to 4979 participants. Statistical approaches, including meta-analysis and linkage disequilibrium plots, were used to refine association signals; we analysed pooled data from all seven populations to determine the effect of each SNP on variations in circulating LDL-cholesterol concentrations.
Findings: In our initial scan, we found two SNPs (rs599839 [p=1.7x10(-15)] and rs4970834 [p=3.0x10(-11)]) that showed genome-wide statistical association with LDL cholesterol at chromosomal locus 1p13.3. The second genome screen found a third statistically associated SNP at the same locus (rs646776 [p=4.3x10(-9)]). Meta-analysis of data from all studies showed an association of SNPs rs599839 (combined p=1.2x10(-33)) and rs646776 (p=4.8x10(-20)) with LDL-cholesterol concentrations. SNPs rs599839 and rs646776 both explained around 1% of the variation in circulating LDL-cholesterol concentrations and were associated with about 15% of an SD change in LDL cholesterol per allele, assuming an SD of 1 mmol/L.
Interpretation: We found evidence for a novel locus for LDL cholesterol on chromosome 1p13.3. These results potentially provide insight into the biological mechanisms that underlie the regulation of LDL cholesterol and might help in the discovery of novel therapeutic targets for cardiovascular disease.
Funded by: Medical Research Council: G0000934, G0701863, MC_QA137934, MC_U105630924, MC_U106188470; Wellcome Trust: 068545/Z/02
Lancet (London, England) 2008;371;9611;483-91
New case of interstitial deletion 12(q15-q21.2) in a girl with facial dysmorphism and mental retardation.
Département de Génétique, Hôpital Necker Enfants Malades, Paris, France.
Interstitial deletions of the long arm of chromosome 12 are rare rearrangements with only 15 cases reported in the literature. The phenotype may include facial dysmorphism, developmental delay, ectodermal abnormalities, cardiac and renal malformations depending on breakpoints' position. Here, we describe a third case of 12(q15-q21.2) deletion ascertained through CGH-array analyses and provide a 5-year follow-up. The patient presented with pre- and postnatal growth retardation, congenital heart defect, developmental delay, and facial dysmorphism changing with age, underlining the importance of long-term follow-up. We compared this new case with previous observations of 12q deletions in order to propose phenotype-karyotype correlations.
American journal of medical genetics. Part A 2008;146A;1;93-6
Repeated replication and a prospective meta-analysis of the association between chromosome 9p21.3 and coronary artery disease.
Medizinische Klinik II, Universität zu Lübeck, Lübeck, Germany. email@example.com
Background: Recently, genome-wide association studies identified variants on chromosome 9p21.3 as affecting the risk of coronary artery disease (CAD). We investigated the association of this locus with CAD in 7 case-control studies and undertook a meta-analysis.
Methods and results: A single-nucleotide polymorphism (SNP), rs1333049, representing the 9p21.3 locus, was genotyped in 7 case-control studies involving a total of 4645 patients with myocardial infarction or CAD and 5177 controls. The mode of inheritance was determined. In addition, in 5 of the 7 studies, we genotyped 3 additional SNPs to assess a risk-associated haplotype (ACAC). Finally, a meta-analysis of the present data and previously published samples was conducted. A limited fine mapping of the locus was performed. The risk allele (C) of the lead SNP, rs1333049, was uniformly associated with CAD in each study (P<0.05). In a pooled analysis, the odds ratio per copy of the risk allele was 1.29 (95% confidence interval, 1.22 to 1.37; P=0.0001). Haplotype analysis further suggested that this effect was not homogeneous across the haplotypic background (test for interaction, P=0.0079). An autosomal-additive mode of inheritance best explained the underlying association. The meta-analysis of the rs1333049 SNP in 12,004 cases and 28,949 controls increased the overall level of evidence for association with CAD to P=6.04x10(-10) (odds ratio, 1.24; 95% confidence interval, 1.20 to 1.29). Genotyping of 31 additional SNPs in the region identified several with a highly significant association with CAD, but none had predictive information beyond that of the rs1333049 SNP.
Conclusions: This broad replication provides unprecedented evidence for association between genetic variants at chromosome 9p21.3 and risk of CAD.
Funded by: British Heart Foundation; Wellcome Trust: 077011, 082371
Protein interactions in human genetic diseases.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK. firstname.lastname@example.org
We present a novel method that combines protein structure information with protein interaction data to identify residues that form part of an interaction interface. Our prediction method can retrieve interaction hotspots with an accuracy of 60% (at a 20% false positive rate). The method was applied to all mutations in the Online Mendelian Inheritance in Man (OMIM) database, predicting 1,428 mutations to be related to an interaction defect. Combining predicted and hand-curated sets, we discuss how mutations affect protein interactions in general.
Funded by: Wellcome Trust: 087656
Genome biology 2008;9;1;R9
Robust, persistent transgene expression in human embryonic stem cells is achieved with AAVS1-targeted integration.
Department of Surgery, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Hills Road, Cambridge CB2 OXY, United Kingdom. email@example.com
Silencing and variegated transgene expression are poorly understood problems that can interfere with gene function studies in human embryonic stem cells (hESCs). We show that transgene expression (enhanced green fluorescent protein [EGFP]) from random integration sites in hESCs is affected by variegation and silencing, with only half of hESCs expressing the transgene, which is gradually lost after withdrawal of selection and differentiation. We tested the hypothesis that a transgene integrated into the adeno-associated virus type 2 (AAV2) target region on chromosome 19, known as the AAVS1 locus, would maintain transgene expression in hESCs. When we used AAV2 technology to target the AAVS1 locus, 4.16% of hESC clones achieved AAVS1-targeted integration. Targeted clones expressed Oct-4, stage-specific embryonic antigen-3 (SSEA3), and Tra-1-60 and differentiated into all three primary germ layers. EGFP expression from the AAVS1 locus showed significantly reduced variegated expression when in selection, with 90% +/- 4% of cells expressing EGFP compared with 57% +/- 32% for randomly integrated controls, and reduced tendency to undergo silencing, with 86% +/- 7% hESCs expressing EGFP 25 days after withdrawal of selection compared with 39% +/- 31% for randomly integrated clones. In addition, quantitative polymerase chain reaction analysis of hESCs also indicated significantly higher levels of EGFP mRNA in AAVS1-targeted clones as compared with randomly integrated clones. Transgene expression from the AAVS1 locus was shown to be stable during hESC differentiation, with more than 90% of cells expressing EGFP after 15 days of differentiation, as compared with approximately 30% for randomly integrated clones. These results demonstrate the utility of transgene integration at the AAVS1 locus in hESCs and its potential clinical application.
Funded by: Medical Research Council: G0300300, G0300723, G0600275
Stem cells (Dayton, Ohio) 2008;26;2;496-504
Conservation of the H19 noncoding RNA and H19-IGF2 imprinting mechanism in therians.
The Babraham Institute, Laboratory of Developmental Genetics and Imprinting, Cambridge CB22 3AT, UK.
Comparisons between eutherians and marsupials suggest limited conservation of the molecular mechanisms that control genomic imprinting in mammals. We have studied the evolution of the imprinted IGF2-H19 locus in therians. Although marsupial orthologs of protein-coding exons were easily identified, the use of evolutionarily conserved regions and low-stringency Bl2seq comparisons was required to delineate a candidate H19 noncoding RNA sequence. The therian H19 orthologs show miR-675 and exon structure conservation, suggesting functional selection on both features. Transcription start site sequences and poly(A) signals are also conserved. As in eutherians, marsupial H19 is maternally expressed and paternal methylation upstream of the gene originates in the male germline, encompasses a CTCF insulator, and spreads somatically into the H19 gene. The conservation in all therians of the mechanism controlling imprinting of the IGF2-H19 locus suggests a sequential model of imprinting evolution.
Funded by: Biotechnology and Biological Sciences Research Council: BBS/E/B/0000C214, BBS/E/B/0000S119; Medical Research Council: G0400154; Wellcome Trust
Nature genetics 2008;40;8;971-6
The novel mouse mutation Oblivion inactivates the PMCA2 pump and causes progressive hearing loss.
Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, UK.
Progressive hearing loss is common in the human population, but we have few clues to the molecular basis. Mouse mutants with progressive hearing loss offer valuable insights, and ENU (N-ethyl-N-nitrosourea) mutagenesis is a useful way of generating models. We have characterised a new ENU-induced mouse mutant, Oblivion (allele symbol Obl), showing semi-dominant inheritance of hearing impairment. Obl/+ mutants showed increasing hearing impairment from post-natal day (P)20 to P90, and loss of auditory function was followed by a corresponding base to apex progression of hair cell degeneration. Obl/Obl mutants were small, showed severe vestibular dysfunction by 2 weeks of age, and were completely deaf from birth; sensory hair cells were completely degenerate in the basal turn of the cochlea, although hair cells appeared normal in the apex. We mapped the mutation to Chromosome 6. Mutation analysis of Atp2b2 showed a missense mutation (2630C-->T) in exon 15, causing a serine to phenylalanine substitution (S877F) in transmembrane domain 6 of the PMCA2 pump, the resident Ca(2+) pump of hair cell stereocilia. Transmembrane domain mutations in these pumps generally are believed to be incompatible with normal targeting of the protein to the plasma membrane. However, analyses of hair cells in cultured utricular maculae of Obl/Obl mice and of the mutant Obl pump in model cells showed that the protein was correctly targeted to the plasma membrane. Biochemical and biophysical characterisation showed that the pump had lost a significant portion of its non-stimulated Ca(2+) exporting ability. These findings can explain the progressive loss of auditory function, and indicate the limits in our ability to predict mechanism from sequence alone.
Funded by: Medical Research Council: G0300212, MC_QA137918; Telethon: GGP04169; Wellcome Trust
PLoS genetics 2008;4;10;e1000238
Large recurrent microdeletions associated with schizophrenia.
CNS Division, deCODE genetics, Sturlugata 8, IS-101 Reykjavík, Iceland.
Reduced fecundity, associated with severe mental disorders, places negative selection pressure on risk alleles and may explain, in part, why common variants have not been found that confer risk of disorders such as autism, schizophrenia and mental retardation. Thus, rare variants may account for a larger fraction of the overall genetic risk than previously assumed. In contrast to rare single nucleotide mutations, rare copy number variations (CNVs) can be detected using genome-wide single nucleotide polymorphism arrays. This has led to the identification of CNVs associated with mental retardation and autism. In a genome-wide search for CNVs associating with schizophrenia, we used a population-based sample to identify de novo CNVs by analysing 9,878 transmissions from parents to offspring. The 66 de novo CNVs identified were tested for association in a sample of 1,433 schizophrenia cases and 33,250 controls. Three deletions at 1q21.1, 15q11.2 and 15q13.3 showing nominal association with schizophrenia in the first sample (phase I) were followed up in a second sample of 3,285 cases and 7,951 controls (phase II). All three deletions significantly associate with schizophrenia and related psychoses in the combined sample. The identification of these rare, recurrent risk variants, having occurred independently in multiple founders and being subject to negative selection, is important in itself. CNV analysis may also point the way to the identification of additional and more prevalent risk variants in genes and pathways involved in schizophrenia.
Funded by: Department of Health: PDA/02/06/016; Medical Research Council: G0901310; NIMH NIH HHS: R01 MH078075, R01MH71425-01A1; Wellcome Trust: 089061
Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis.
Department of Microbiology, Monash University, Clayton 3800, Australia. firstname.lastname@example.org
Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.
Funded by: NIAID NIH HHS: R01 AI036396; Wellcome Trust
Genome research 2008;18;5;729-41
A DNA transposon-based approach to validate oncogenic mutations in the mouse.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, England.
Large-scale cancer genome projects will soon be able to sequence many cancer genomes to comprehensively identify genetic changes in human cancer. Genome-wide association studies have also identified putative cancer associated loci. Functional validation of these genetic mutations in vivo is becoming a challenge. We describe here a DNA transposon-based platform that permits us to explore the oncogenic potential of genetic mutations in the mouse. Briefly, promoter-less human cancer gene cDNAs were first cloned into Sleeping Beauty (SB) transposons. DNA transposition in the mouse that carried both the transposons and the SB transposase made it possible for the cDNAs to be expressed from an appropriate endogenous genomic locus and in the relevant cell types for tumor development. Consequently, these mice developed a broad spectrum of tumors at very early postnatal stages. This technology thus complements the large-scale cancer genome projects.
Funded by: Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2008;105;50;19904-9
Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project.
Funded by: Biotechnology and Biological Sciences Research Council: BB/E006248/1, BB/E025080/1, E025080/1; Medical Research Council: MC_U142684171; NHGRI NIH HHS: U54 HG004028; NIBIB NIH HHS: EB005034-01, R01 EB005034, R01 EB005034-04
Nature biotechnology 2008;26;8;889-96
Whole genome-amplified DNA: insights and imputation.
Funded by: Medical Research Council: G0600230, G19/9, MC_U106179471; Wellcome Trust: 077011, 077016
Nature methods 2008;5;4;279-80
Maternal footprints of Southeast Asians in North India.
Centre for Cellular and Molecular Biology, Hyderabad, India.
We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNA(Lys) region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
Funded by: Wellcome Trust: 077009
Human heredity 2008;66;1;1-9
Characterization of 6q deletions in mature B cell lymphomas and childhood acute lymphoblastic leukemia.
Medical Genetics Unit, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.
The study was undertaken with the aim to outline deletion patterns involving the long arm of chromosome 6, a common abnormality in lymphoproliferative disorders. Using a chromosome 6 specific tile path array, 60 samples from in total 49 cases with mantle cell lymphoma (MCL), de novo diffuse large B-cell lymphoma (DLBCL), transformed DLBCL as well as preceding follicular lymphoma (FL), and childhood acute lymphoblastic leukemia (ALL), were characterized. Twenty-six of the studied cases, representing all diagnoses, showed a 6q deletion among which 85% involved a 3 Mb region in 6q21. The minimal deleted interval in 6q21 encompasses the FOXO3A, PRDM1 and HACE1 candidate genes. The PRDM1 gene was found homozygously deleted in a case of DLBCL. Moreover, in two DLBCL cases, an overlapping homozygous deletion was identified in 6q23.3 - 24.1, encompassing the TNFAIP3 gene among others. Taken together, we refined the deletion pattern within the long arm of chromosome 6 in four different types of hematological malignances, suggesting the location of tumor suppressor genes involved in the tumor progression.
Leukemia & lymphoma 2008;49;3;477-87
Phenotypical characteristics of idiopathic infantile nystagmus with and without mutations in FRMD7.
Ophthalmology Group, University of Leicester, Leicester, UK.
Idiopathic infantile nystagmus (IIN) consists of involuntary oscillations of the eyes. The familial form is most commonly X-linked. We recently found mutations in a novel gene FRMD7 (Xq26.2), which provided an opportunity to investigate a genetically defined and homogeneous group of patients with nystagmus. We compared clinical features and eye movement recordings of 90 subjects with mutation in the gene (FRMD7 group) to 48 subjects without mutations but with clinical IIN (non-FRMD7 group). Fifty-eight female obligate carriers of the mutation were also investigated. The median visual acuity (VA) was 0.2 logMAR (Snellen equivalent 6/9) in both groups and most patients had good stereopsis. The prevalence of strabismus was also similar (FRMD7: 7.8%, non-FRMD7: 10%). The presence of anomalous head posture (AHP) was significantly higher in the non-FRMD7 group (P < 0.0001). The amplitude of nystagmus was more strongly dependent on the direction of gaze in the FRMD7 group being lower at primary position (P < 0.0001), compared to non-FRMD7 group (P = 0.83). Pendular nystagmus waveforms were also more frequent in the FRMD7 group (P = 0.003). Fifty-three percent of the obligate female carriers of an FRMD7 mutation were clinically affected. The VA's in affected females were slightly better compared to affected males (P = 0.014). Subnormal optokinetic responses were found in a subgroup of obligate unaffected carriers, which may be interpreted as a sub-clinical manifestation. FRMD7 is a major cause of X-linked IIN. Most clinical and eye movement characteristics were similar in the FRMD7 group and non-FRMD7 group with most patients having good VA and stereopsis and low incidence of strabismus. Fewer patients in the FRMD7 group had AHPs, their amplitude of nystagmus being lower in primary position. Our findings are helpful in the clinical identification of IIN and genetic counselling of nystagmus patients.
Brain : a journal of neurology 2008;131;Pt 5;1259-67
Vive la différence.
Nature reviews. Microbiology 2008;6;7;502-3
Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways.
The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. email@example.com
We have determined the complete genome sequences of a host-promiscuous Salmonella enterica serovar Enteritidis PT4 isolate P125109 and a chicken-restricted Salmonella enterica serovar Gallinarum isolate 287/91. Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis. Significantly, the genome of S. Gallinarum has undergone extensive degradation through deletion and pseudogene formation. Comparison of the pseudogenes in S. Gallinarum with those identified previously in other host-adapted bacteria reveals the loss of many common functional traits and provides insights into possible mechanisms of host and tissue adaptation. We propose that experimental analysis in chickens and mice of S. Enteritidis-harboring mutations in functional homologs of the pseudogenes present in S. Gallinarum could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.
Funded by: Wellcome Trust
Genome research 2008;18;10;1624-37
Chlamydia trachomatis: genome sequence analysis of lymphogranuloma venereum isolates.
The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
Chlamydia trachomatis is the most common cause of sexually transmitted infections in the UK, a statistic that is also reflected globally. There are three biovariants of C. trachomatis: trachoma (serotypes A-C) and two sexually transmitted pathovars; serotypes D-K and lymphogranuloma venereum (LGV). Trachoma isolates and the sexually transmitted serotypes D-K are noninvasive, whereas the LGV strains are invasive, causing a disseminating infection of the local draining lymph nodes. Genome sequences are available for single isolates from the trachoma (serotype A) and sexually transmitted (serotype D) biotypes. We sequenced two isolates from the remaining biotype, LGV, a long-term laboratory passaged strain and the recent "epidemic" LGV isolate-causing proctitis. Although the genome of the LGV strain shows no additional genes that could account for the differences in disease outcome, we found evidence of functional gene loss and identified regions of heightened sequence variation that have previously been shown to be important sites for interstrain recombination. We have used new sequencing technologies to show that the recent clinical LGV isolate causing proctitis is unlikely to be a newly emerged strain but is most probably an old strain with relatively new clinical manifestations.
Funded by: Wellcome Trust: 080348
Genome research 2008;18;1;161-71
Evolutionary plasticity of genetic interaction networks.
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Non-additive genetic interactions contribute to many genetic disorders, but they are extremely difficult to predict. Here we show that genetic interactions identified in yeast, unlike gene functions or protein interactions, are not highly conserved in animals. Genetic interactions are therefore unlikely to represent simple redundancy between genes or pathways, and genetic interactions from yeast do not directly predict genetic interactions in higher eukaryotes, including humans.
Funded by: Wellcome Trust
Nature genetics 2008;40;4;390-1
Determination and validation of principal gene products.
Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain. email@example.com
Motivation: Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear.
Results: We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198
Bioinformatics (Oxford, England) 2008;24;1;11-7
Germline rates of de novo meiotic deletions and duplications causing several genomic disorders.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.
Meiotic recombination between highly similar duplicated sequences (nonallelic homologous recombination, NAHR) generates deletions, duplications, inversions and translocations, and it is responsible for genetic diseases known as 'genomic disorders', most of which are caused by altered copy number of dosage-sensitive genes. NAHR hot spots have been identified within some duplicated sequences. We have developed sperm-based assays to measure the de novo rate of reciprocal deletions and duplications at four NAHR hot spots. We used these assays to dissect the relative rates of NAHR between different pairs of duplicated sequences. We show that (i) these NAHR hot spots are specific to meiosis, (ii) deletions are generated at a higher rate than their reciprocal duplications in the male germline and (iii) some of these genomic disorders are likely to have been underascertained clinically, most notably that resulting from the duplication of 7q11, the reciprocal of the deletion causing Williams-Beuren syndrome.
Funded by: Wellcome Trust: 077008, 077014
Nature genetics 2008;40;1;90-5
Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
Ligation Haplotyping is a robust, novel method for experimental determination of haplotypes over long distances, which can be applied to assaying both sequence and structural variation. The simplicity and efficacy of the method for genotyping large chromosomal rearrangements and haplotyping SNPs over long distances make it a valuable and powerful addition to the methodological repertoire, which will be beneficial to studies of population genetics and evolution, disease association and inheritance, and genomic variation. We illustrate the versatility of the method both by genotyping a Yp paracentric inversion, found in approximately 60% of Northwest European males, that strongly influences the germline rate of infertility-causing XY translocations and by haplotyping two autosomal SNPs that lie 16.4 kb apart on chromosome 7, and which influence an individual's susceptibility to systemic lupus erythematosus.
Funded by: Wellcome Trust
Nucleic acids research 2008;36;13;e82
An evolutionary perspective on Y-chromosomal variation and male infertility.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, UK. email@example.com
Genetic variation on the Y chromosome is one of the best-documented causes of male infertility, but the genes responsible have still not been identified. This review discusses how an evolutionary perspective may help with interpretation of the data available and suggest novel approaches to identify key genes. Comparison with the chimpanzee Y chromosome indicates that USP9Y is dispensable in apes, but that multiple copies of TSPY1 may have an important role. Comparisons between infertile and control groups in search of genetic susceptibility factors are more complex for the Y chromosome than for the rest of the genome because of population stratification and require unusual levels of confirmation. But the extreme population stratification exhibited by the Y also allows populations particularly suitable for some studies to be identified, such as the partial AZFc deletions common in Northern European populations where further dissection of this complex structural region would be facilitated.
Funded by: Wellcome Trust
International journal of andrology 2008;31;4;376-82
Loss of Rassf1a cooperates with Apc(Min) to accelerate intestinal tumourigenesis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK. firstname.lastname@example.org
Promoter methylation of the RAS-association domain family 1, isoform A gene (RASSF1A) is one of the most frequent events found in human tumours. In this study we set out to test the hypothesis that loss of Rassf1a can cooperate with inactivation of the adenomatous polyposis coli (Apc) gene to accelerate intestinal tumourigenesis using the Apc-Min (Apc(Min/+)) mouse model, as mutational or deletional inactivation of APC is a frequent early event in the genesis of intestinal cancer. Further, loss of RASSF1A has also been reported to occur in premalignant adenomas of the bowel. RASSF1A has been implicated in an array of pivotal cellular processes, including regulation of the cell cycle, apoptosis, microtubule stability and most recently in the beta-catenin signalling pathway. By interbreeding isoform specific Rassf1a knockout mice with Apc(+/Min) mice, we showed that loss of Rassf1a results in a significant increase in adenomas of the small intestine and accelerated intestinal tumourigenesis leading to the earlier death of adenocarcinoma-bearing mice and decreased overall survival. Comparative genomic hybridization of adenomas from Rassf1a(-/-); Apc(+/Min) mice revealed no evidence of aneuploidy or gross chromosomal instability (no difference to adenomas from Rassf1a(+/+); Apc(+/Min) mice). Immunohistochemical analysis of adenomas revealed increased nuclear beta-catenin accumulation in adenomas from Rassf1a(-/-); Apc(+/Min) mice, compared to those from Rassf1a(+/+); Apc(+/Min) mice, but no differences in proliferation marker (Ki67) staining patterns. Collectively these data demonstrate cooperation between inactivation of Rassf1a and Apc resulting in accelerated intestinal tumourigenesis, with adenomas showing increased nuclear accumulation of beta-catenin, supporting a mechanistic link via loss of the known interaction of Rassf1 with beta-TrCP that usually mediates degradation of beta-catenin.
Funded by: Cancer Research UK: A8449; Wellcome Trust: 079643
High-resolution mapping of expression-QTLs yields insight into human gene regulation.
Department of Human Genetics, The University of Chicago, Chicago, IL, USA. email@example.com
Recent studies of the HapMap lymphoblastoid cell lines have identified large numbers of quantitative trait loci for gene expression (eQTLs). Reanalyzing these data using a novel Bayesian hierarchical model, we were able to create a surprisingly high-resolution map of the typical locations of sites that affect mRNA levels in cis. Strikingly, we found a strong enrichment of eQTLs in the 250 bp just upstream of the transcription end site (TES), in addition to an enrichment around the transcription start site (TSS). Most eQTLs lie either within genes or close to genes; for example, we estimate that only 5% of eQTLs lie more than 20 kb upstream of the TSS. After controlling for position effects, SNPs in exons are approximately 2-fold more likely than SNPs in introns to be eQTLs. Our results suggest an important role for mRNA stability in determining steady-state mRNA levels, and highlight the potential of eQTL mapping as a high-resolution tool for studying the determinants of gene regulation.
Funded by: Howard Hughes Medical Institute; NHGRI NIH HHS: HG002772, HG02585-01, R01 HG002585, R01 HG002772; NIGMS NIH HHS: GM077959, R01 GM077959
PLoS genetics 2008;4;10;e1000214
Habitual energy expenditure modifies the association between NOS3 gene polymorphisms and blood pressure.
Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Cambridge, UK. firstname.lastname@example.org
Background: The endothelial nitric-oxide synthase (NOS3) gene encodes the enzyme (eNOS) that synthesizes the molecule nitric oxide, which facilitates endothelium-dependent vasodilation in response to physical activity. Thus, energy expenditure may modify the association between the genetic variation at NOS3 and blood pressure.
Methods: To test this hypothesis, we genotyped 11 NOS3 polymorphisms, capturing all common variations, in 726 men and women from the Medical Research Council (MRC) Ely Study (age (mean +/- s.d.): 55 +/- 10 years, body mass index: 26.4 +/- 4.1 kg/m(2)). Habitual/non-resting energy expenditure (NREE) was assessed via individually calibrated heart rate monitoring over 4 days.
Results: The intronic variant, IVS25+15 [G-->A], was significantly associated with blood pressure; GG homozygotes had significantly lower levels of diastolic blood pressure (DBP) (-2.8 mm Hg; P = 0.016) and systolic blood pressure (SBP) (-1.9 mm Hg; P = 0.018) than A-allele carriers. The interaction between NREE and IVS25+15 was also significant for both DBP (P = 0.006) and SBP (P = 0.026), in such a way that the effect of the GG-genotype on blood pressure was stronger in individuals with higher NREE (DBP: -4.9 mm Hg, P = 0.02. SBP: -3.8 mm Hg, P= 0.03 for the third tertile). Similar results were observed when the outcome was dichotomously defined as hypertension.
Conclusions: In summary, the NOS3 IVS25+15 is directly associated with blood pressure and hypertension in white Europeans. However, the associations are most evident in the individuals with the highest NREE. These results need further replication and have to be ideally tested in a trial before being informative for targeted disease prevention. Eventually, the selection of individuals for lifestyle intervention programs could be guided by knowledge of genotype.
Funded by: Medical Research Council: MC_U106179471, MC_U106179473, MC_U106188470, U.1061.00.001 (79471), U.1061.00.005(79473); Wellcome Trust: 077016, 087636
American journal of hypertension 2008;21;3;297-302
The Gly482Ser genotype at the PPARGC1A gene and elevated blood pressure: a meta-analysis involving 13,949 individuals.
Department of Public Health & Clinical Medicine, Umeå University Hospital, Umeå, Sweden.
The protein encoded by the PPARGC1A gene is expressed at high levels in metabolically active tissues and is involved in the control of oxidative stress via reactive oxygen species detoxification. Several recent reports suggest that the PPARGC1A Gly482Ser (rs8192678) missense polymorphism may relate inversely with blood pressure. We used conventional meta-analysis methods to assess the association between Gly482Ser and systolic (SBP) or diastolic blood pressures (DBP) or hypertension in 13,949 individuals from 17 studies, of which 6,042 were previously unpublished observations. The studies comprised cohorts of white European, Asian, and American Indian adults, and adolescents from South America. Stratified analyses were conducted to control for population stratification. Pooled genotype frequencies were 0.47 (Gly482Gly), 0.42 (Gly482Ser), and 0.11 (Ser482Ser). We found no evidence of association between Gly482Ser and SBP [Gly482Gly: mean = 131.0 mmHg, 95% confidence interval (CI) = 130.5-131.5 mmHg; Gly482Ser mean = 133.1 mmHg, 95% CI = 132.6-133.6 mmHg; Ser482Ser: mean = 133.5 mmHg, 95% CI = 132.5-134.5 mmHg; P = 0.409] or DBP (Gly482Gly: mean = 80.3 mmHg, 95% CI = 80.0-80.6 mmHg; Gly482Ser mean = 81.5 mmHg, 95% CI = 81.2-81.8 mmHg; Ser482Ser: mean = 82.1 mmHg, 95% CI = 81.5-82.7 mmHg; P = 0.651). Contrary to previous reports, we did not observe significant effect modification by sex (SBP, P = 0.966; DBP, P = 0.715). We were also unable to confirm the previously reported association between the Ser482 allele and hypertension [odds ratio: 0.97, 95% CI = 0.87-1.08, P = 0.585]. These results were materially unchanged when analyses were focused on whites only. However, statistical evidence of gene-age interaction was apparent for DBP [Gly482Gly: 73.5 (72.8, 74.2), Gly482Ser: 77.0 (76.2, 77.8), Ser482Ser: 79.1 (77.4, 80.9), P = 4.20 x 10(-12)] and SBP [Gly482Gly: 121.4 (120.4, 122.5), Gly482Ser: 125.9 (124.6, 127.1), Ser482Ser: 129.2 (126.5, 131.9), P = 7.20 x 10(-12)] in individuals <50 yr (n = 2,511); these genetic effects were absent in those older than 50 yr (n = 5,088) (SBP, P = 0.41; DBP, P = 0.51). Our findings suggest that the PPARGC1A Ser482 allele may be associated with higher blood pressure, but this is only apparent in younger adults.
Funded by: Intramural NIH HHS; Medical Research Council: MC_U106179471, MC_U106188470; Wellcome Trust: 077016
Journal of applied physiology (Bethesda, Md. : 1985) 2008;105;4;1352-8
Nature reviews. Microbiology 2008;6;3;176-7
Chromosomal transposition of PiggyBac in mouse embryonic stem cells.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.
Transposon systems are widely used for generating mutations in various model organisms. PiggyBac (PB) has recently been shown to transpose efficiently in the mouse germ line and other mammalian cell lines. To facilitate PB's application in mammalian genetics, we characterized the properties of the PB transposon in mouse embryonic stem (ES) cells. We first measured the transposition efficiencies of PB transposon in mouse embryonic stem cells. We next constructed a PB/SB hybrid transposon to compare PB and Sleeping Beauty (SB) transposon systems and demonstrated that PB transposition was inhibited by DNA methylation. The excision and reintegration rates of a single PB from two independent genomic loci were measured and its ability to mutate genes with gene trap cassettes was tested. We examined PB's integration site distribution in the mouse genome and found that PB transposition exhibited local hopping. The comprehensive information from this study should facilitate further exploration of the potential of PB and SB DNA transposons in mammalian genetics.
Funded by: NIGMS NIH HHS: 5R21GM079528, R21 GM079528; Wellcome Trust
Proceedings of the National Academy of Sciences of the United States of America 2008;105;27;9290-5
Genome analysis of the platypus reveals unique signatures of evolution.
Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA. email@example.com
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Funded by: Medical Research Council: MC_U137761446; NCI NIH HHS: P01 CA013106, P01 CA013106-37; NHGRI NIH HHS: HG002238, R01 HG002238, R01 HG002385, R01 HG002939, R01 HG004037, R01 HG004037-02, R01HG02385; NIGMS NIH HHS: R01 GM059290, R01 GM59290; Wellcome Trust: 062023
urg1: a uracil-regulatable promoter system for fission yeast with short induction and repression times.
Cancer Research United Kingdom Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
Background: The fission yeast Schizosaccharomyces pombe is a popular genetic model organism with powerful experimental tools. The thiamine-regulatable nmt1 promoter and derivatives, which take >15 hours for full induction, are most commonly used for controlled expression of ectopic genes. Given the short cell cycle of fission yeast, however, a promoter system that can be rapidly regulated, similar to the GAL system for budding yeast, would provide a key advantage for many experiments.
Methodology/principal findings: We used S. pombe microarrays to identify three neighbouring genes (urg1, urg2, and urg3) whose transcript levels rapidly and strongly increased in response to uracil, a condition which otherwise had little effect on global gene expression. We cloned the promoter of urg1 (uracil-regulatable gene) to create several PCR-based gene targeting modules for replacing native promoters with the urg1 promoter (Purg1) in the normal chromosomal locations of genes of interest. The kanMX6 and natMX6 markers allow selection under urg1 induced and repressed conditions, respectively. Some modules also allow N-terminal tagging of gene products placed under urg1 control. Using pom1 as a proof-of-principle, we observed a maximal increase of Purg1-pom1 transcripts after uracil addition within less than 30 minutes, and a similarly rapid decrease after uracil removal. The induced and repressed transcriptional states remained stable over 24-hour periods. RT-PCR comparisons showed that both induced and repressed Purg1-pom1 transcript levels were lower than corresponding P3nmt1-pom1 levels (wild-type nmt1 promoter) but higher than P81nmt1-pom1 levels (weak nmt1 derivative).
Conclusions/significance: We exploited the urg1 promoter system to rapidly induce pom1 expression at defined cell-cycle stages, showing that ectopic pom1 expression leads to cell branching in G2-phase but much less so in G1-phase. The high temporal resolution provided by the urg1 promoter should facilitate experimental design and improve the genetic toolbox for the fission yeast community.
Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118
PloS one 2008;3;1;e1428
Genome-wide association analysis identifies 20 loci that influence adult height.
Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Magdalen Road, Exeter EX1 2LU, UK.
Adult height is a model polygenic trait, but there has been limited success in identifying the genes underlying its normal variation. To identify genetic variants influencing adult human height, we used genome-wide association data from 13,665 individuals and genotyped 39 variants in an additional 16,482 samples. We identified 20 variants associated with adult height (P < 5 x 10(-7), with 10 reaching P < 1 x 10(-10)). Combined, the 20 SNPs explain approximately 3% of height variation, with a approximately 5 cm difference between the 6.2% of people with 17 or fewer 'tall' alleles compared to the 5.5% with 27 or more 'tall' alleles. The loci we identified implicate genes in Hedgehog signaling (IHH, HHIP, PTCH1), extracellular matrix (EFEMP1, ADAMTSL3, ACAN) and cancer (CDK6, HMGA2, DLEU7) pathways, and provide new insights into human growth and developmental processes. Finally, our results provide insights into the genetic architecture of a classic quantitative trait.
Funded by: British Heart Foundation: FS/05/061/19501, PG/02/128/14470, PG02/128; Medical Research Council: G0600705, G0701863, G9521010, G9521010(63660), G9521010D, MC_U106188470; Wellcome Trust: 076113, 077011, 077016, 090532
Nature genetics 2008;40;5;575-83
Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.
Cancer Research UK Fission Yeast Functional Genomics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
Recent data from several organisms indicate that the transcribed portions of genomes are larger and more complex than expected, and that many functional properties of transcripts are based not on coding sequences but on regulatory sequences in untranslated regions or non-coding RNAs. Alternative start and polyadenylation sites and regulation of intron splicing add additional dimensions to the rich transcriptional output. This transcriptional complexity has been sampled mainly using hybridization-based methods under one or few experimental conditions. Here we applied direct high-throughput sequencing of complementary DNAs (RNA-Seq), supplemented with data from high-density tiling arrays, to globally sample transcripts of the fission yeast Schizosaccharomyces pombe, independently from available gene annotations. We interrogated transcriptomes under multiple conditions, including rapid proliferation, meiotic differentiation and environmental stress, as well as in RNA processing mutants to reveal the dynamic plasticity of the transcriptional landscape as a function of environmental, developmental and genetic factors. High-throughput sequencing proved to be a powerful and quantitative method to sample transcriptomes deeply at maximal resolution. In contrast to hybridization, sequencing showed little, if any, background noise and was sensitive enough to detect widespread transcription in >90% of the genome, including traces of RNAs that were not robustly transcribed or rapidly degraded. The combined sequencing and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation. Sequence reads spanning exon-exon or exon-intron junctions give unique insight into a surprising variability in splicing efficiency across introns, genes and conditions. Splicing efficiency was largely coordinated with transcript levels, and increased transcription led to increased splicing in test genes. Hundreds of introns showed such regulated splicing during cellular proliferation or differentiation.
Funded by: Cancer Research UK: A6517, C9546/A6517; Wellcome Trust: 077118
The vertebrate genome annotation (Vega) database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. firstname.lastname@example.org
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega.
Funded by: NHGRI NIH HHS: U54 HG004555; Wellcome Trust: 077198
Nucleic acids research 2008;36;Database issue;D753-60
Sequence differentiation in regions identified by a genome scan for local adaptation.
Institute of Integrative and Comparative Biology, University of Leeds, Leeds LS29JT, UK.
Genome scans using large numbers of randomly selected markers have revealed a small proportion of loci that deviate from neutral expectations and so may mark genomic regions that contribute to local adaptation. Measurements of sequence differentiation and identification of genes in these regions is important but difficult, especially in organisms with limited genetic information available. We have followed up a genome scan in the marine gastropod, Littorina saxatilis, by searching a bacterial artificial chromosome library with differentiated and undifferentiated markers, sequencing four bacterial artificial chromosomes and then analysing sequence variation in population samples for fragments at, and close to the original marker polymorphisms. We show that sequence differentiation follows the patterns expected from the original marker frequencies, that differentiated markers identify independent and highly localized sites and that these sites fall outside coding regions. Two differentiated loci are characterized by insertions of putative transposable elements that appear to have increased in frequency recently and which might influence expression of downstream genes. These results provide strong candidate loci for the study of local adaptation in Littorina. They demonstrate an approach that can be applied to follow up genome scans in other taxa and they show that the genome scan approach can lead rapidly to candidate genes in nonmodel organisms.
Funded by: Biotechnology and Biological Sciences Research Council
Molecular ecology 2008;17;13;3123-35
Variation of the oxytocin/neurophysin I (OXT) gene in four human populations.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambs., CB10 1SA, UK.
Oxytocin is a short peptide with multiple functions in human biology and has been implicated in autism. We aimed to determine the normal pattern of variation around the oxytocin gene and resequenced it and its flanking regions in 91 individuals from four HapMap populations and one chimpanzee. We identified 14 single nucleotide polymorphisms (SNPs), all noncoding, including eight that were novel. Population genetic analyses were largely consistent with a neutral evolutionary history, but an Hudson-Kreitman-Aguadé (HKA) test revealed more variation within the human population than expected from the level of chimpanzee-human divergence.
Funded by: Wellcome Trust: 077009
Journal of human genetics 2008;53;7;637-43
Adaptive evolution of UGT2B17 copy-number variation.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
The human UGT2B17 gene varies in copy number from zero to two per individual and also differs in mean number between populations from Africa, Europe, and East Asia. We show that such a high degree of geographical variation is unusual and investigate its evolutionary history. This required first reinterpreting the reference sequence in this region of the genome, which is misassembled from the two different alleles separated by an artifactual gap. A corrected assembly identifies the polymorphism as a 117 kb deletion arising by nonallelic homologous recombination between approximately 4.9 kb segmental duplications and allows the deletion breakpoint to be identified. We resequenced approximately 12 kb of DNA spanning the breakpoint in 91 humans from three HapMap and one extended HapMap populations and one chimpanzee. Diversity was unusually high and the time to the most recent common ancestor was estimated at approximately 2.4 or approximately 3.0 million years by two different methods, with evidence of balancing selection in Europe. In contrast, diversity was low in East Asia where a single haplotype predominated, suggesting positive selection for the deletion in this part of the world.
Funded by: Wellcome Trust
American journal of human genetics 2008;83;3;337-46
Y-chromosomal diversity in Lebanon is structured by recent historical events.
The Lebanese American University, Chouran, Beirut 1102 2801, Lebanon.
Lebanon is an eastern Mediterranean country inhabited by approximately four million people with a wide variety of ethnicities and religions, including Muslim, Christian, and Druze. In the present study, 926 Lebanese men were typed with Y-chromosomal SNP and STR markers, and unusually, male genetic variation within Lebanon was found to be more strongly structured by religious affiliation than by geography. We therefore tested the hypothesis that migrations within historical times could have contributed to this situation. Y-haplogroup J*(xJ2) was more frequent in the putative Muslim source region (the Arabian Peninsula) than in Lebanon, and it was also more frequent in Lebanese Muslims than in Lebanese non-Muslims. Conversely, haplogroup R1b was more frequent in the putative Christian source region (western Europe) than in Lebanon and was also more frequent in Lebanese Christians than in Lebanese non-Christians. The most common R1b STR-haplotype in Lebanese Christians was otherwise highly specific for western Europe and was unlikely to have reached its current frequency in Lebanese Christians without admixture. We therefore suggest that the Islamic expansion from the Arabian Peninsula beginning in the seventh century CE introduced lineages typical of this area into those who subsequently became Lebanese Muslims, whereas the Crusader activity in the 11(th)-13(th) centuries CE introduced western European lineages into Lebanese Christians.
Funded by: Wellcome Trust
American journal of human genetics 2008;82;4;873-82
Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D). Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and approximately 2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975. We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P = 5.0 x 10(-14)), CDC123-CAMK1D (P = 1.2 x 10(-10)), TSPAN8-LGR5 (P = 1.1 x 10(-9)), THADA (P = 1.1 x 10(-9)), ADAMTS9 (P = 1.2 x 10(-8)) and NOTCH2 (P = 4.1 x 10(-8)) gene regions. Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.
Funded by: Department of Health: DHCS/07/07/008; Intramural NIH HHS: Z01 HG000024; Medical Research Council: G0000649, G0600705, G0601261, MC_U106179471; NCI NIH HHS: CA87969, P01 CA087969; NHGRI NIH HHS: HG002651, N01HG65403, R01 HG002651, U01 HG004171, U01 HG004399; NHLBI NIH HHS: HL084729, U01 HL084729; NIDA NIH HHS: U54 DA021519; NIDDK NIH HHS: DK062370, DK072193, DK58845, P30 DK040561, P30 DK040561-12, R01 DK029867, R01 DK058845, R01 DK062370, R01 DK072193, R56 DK062370, U01 DK062370; Wellcome Trust: 072960, 076113, 077016, 079557, 090532, GR072960
Nature genetics 2008;40;5;638-45