Sanger Institute - Publications 2005
Number of papers published in 2005: 85
A genome-wide, end-sequenced 129Sv BAC library resource for targeting vector construction.
The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
The majority of gene-targeting experiments in mice are performed in 129Sv-derived embryonic stem (ES) cell lines, which are generally considered to be more reliable at colonizing the germ line than ES cells derived from other strains. Gene targeting is reliant on homologous recombination of a targeting vector with the host ES cell genome. The efficiency of recombination is affected by many factors, including the isogenicity (H. te Riele et al., 1992, Proc. Natl. Acad. Sci. USA 89, 5128-5132) and the length of homologous sequence of the targeting vector and the location of the target locus. Here we describe the double-end sequencing and mapping of 84,507 bacterial artificial chromosomes (BACs) generated from AB2.2 ES cell DNA (129S7/SvEvBrd-Hprtb-m2). We have aligned these BACs against the mouse genome and displayed them on the Ensembl genome browser, DAS: 129S7/AB2.2. This library has an average insert size of 110.68 kb and average depth of genome coverage of 3.63- and 1.24-fold across the autosomes and sex chromosomes, respectively. Over 97% of the mouse genome and 99.1% of Ensembl genes are covered by clones from this library. This publicly available BAC resource can be used for the rapid construction of targeting vectors via recombineering. Furthermore, we show that targeting vectors containing DNA recombineered from this BAC library can be used to target genes efficiently in several 129-derived ES cell lines.
Funded by: Wellcome Trust
BRCTx is a novel, highly conserved RAD18-interacting protein.
The Wellcome Trust Sanger Institute Hinxton, Cambs CB10 1SA, United Kingdom.
The BRCT domain is a highly conserved module found in many proteins that participate in DNA damage checkpoint regulation, DNA repair, and cell cycle control. Here we describe the cloning, characterization, and targeted mutagenesis of Brctx, a novel gene with a BRCT motif. Brctx was found to be expressed ubiquitously in adult tissues and during development, with the highest levels found in testis. Brctx-deficient mice develop normally, show no pathological abnormalities, and are fertile. BRCTx binds to the C terminus of hRAD18 in yeast two-hybrid and immunoprecipitation assays and colocalizes with this protein in the nucleus. Despite this, Brctx-deficient murine embryonic fibroblasts (MEFs) do not show overt sensitivity to DNA-damaging agents. MEFs from Brctx-deficient embryos grow at a similar rate to wild-type MEF CD4/CD8 expressions, and the cell cycle parameters of thymocytes from wild-type and Brctx knockout animals are indistinguishable. Intriguingly, the BRCT domain of BRCTx is responsible for mediating its localization to the nucleus and centrosome in interphase cells. We conclude that, although highly conserved, Brctx is not essential for the above-mentioned processes and may be redundant.
Molecular and cellular biology 2005;25;2;779-88
Mutations of the catalytic subunit of RAB3GAP cause Warburg Micro syndrome.
Section of Medical and Molecular Genetics, University of Birmingham, Birmingham, B15 2TT, UK.
Warburg Micro syndrome (WARBM1) is a severe autosomal recessive disorder characterized by developmental abnormalities of the eye and central nervous system and by microgenitalia. We identified homozygous inactivating mutations in RAB3GAP, encoding RAB3 GTPase activating protein, a key regulator of the Rab3 pathway implicated in exocytic release of neurotransmitters and hormones, in 12 families with Micro syndrome. We hypothesize that the underlying pathogenesis of Micro syndrome is a failure of exocytic release of ocular and neurodevelopmental trophic factors.
Nature genetics 2005;37;3;221-3
The Vertebrate Genome Annotation (Vega) database.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. firstname.lastname@example.org
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions.
Nucleic acids research 2005;33;Database issue;D459-65
Complex disease: pleiotropic gene effects in obesity and type 2 diabetes.
European journal of human genetics : EJHG 2005;13;12;1243-4
Genetics of Type 2 diabetes.
Metabolic Disease Group, The Wellcome Trust Sanger Institute, Cambridge, UK. email@example.com
Type 2 diabetes (T2D) has become a health-care problem worldwide, with the rise in disease prevalence being all the more worrying as it not only affects the developed world but also developing nations with fewer resources to cope with yet another major disease burden. Furthermore, the problem is no longer restricted to the ageing population, as young adults and children are also being diagnosed with T2D. In recent years, there has been a surge in the number of genetic studies of T2D in attempts to identify some of the underlying risk factors. In this review, I highlight the main genes known to cause uncommon monogenic forms of diabetes (e.g. maturity-onset diabetes of the young--MODY--and insulin resistance syndromes), as well as describe some of the main approaches used to identify genes involved in the more common forms of T2D that result from the interaction between environmental risk factors and predisposing genotypes. Linkage and candidate gene studies have been highly successful in the identification of genes that cause the monogenic variants of diabetes and, although progress in the more common forms of T2D has been slow, a number of genes have now been reproducibly associated with T2D risk in multiple studies. These are discussed, as well as the main implications that the diabetes gene discoveries will have in diabetes treatment and prevention.
Diabetic medicine : a journal of the British Diabetic Association 2005;22;5;517-35
The G5 domain: a potential N-acetylglucosamine recognition domain involved in biofilm formation.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Summary: Biofilms are complex microbial communities found at surfaces that are often associated with extracellular polysaccharides. Biofilm formation is a complex process that is being understood at the molecular level only recently. We have identified a novel domain that we call the G5 domain (named after its conserved glycine residues), which is found in a variety of enzymes such as Streptococcal IgA peptidases and various glycosyl hydrolases in bacteria. The G5 domain is found in the Accumulation Associated Protein (AAP), which is an important component in biofilm formation in Staphylococcus aureus. A common feature of the proteins containing G5 domains is N-acetylglucosamine binding, and we attribute this function to the G5 domain.
Bioinformatics (Oxford, England) 2005;21;8;1301-3
Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders.
Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK.
Background: Human myeloproliferative disorders form a range of clonal haematological malignant diseases, the main members of which are polycythaemia vera, essential thrombocythaemia, and idiopathic myelofibrosis. The molecular pathogenesis of these disorders is unknown, but tyrosine kinases have been implicated in several related disorders. We investigated the role of the cytoplasmic tyrosine kinase JAK2 in patients with a myeloproliferative disorder.
Methods: We obtained DNA samples from patients with polycythaemia vera, essential thrombocythaemia, or idiopathic myelofibrosis. The coding exons of JAK2 were bidirectionally sequenced from peripheral-blood granulocytes, T cells, or both. Allele-specific PCR, molecular cytogenetic studies, microsatellite PCR, Affymetrix single nucleotide polymorphism array analyses, and colony assays were undertaken on subgroups of patients.
Findings: A single point mutation (Val617Phe) was identified in JAK2 in 71 (97%) of 73 patients with polycythaemia vera, 29 (57%) of 51 with essential thrombocythaemia, and eight (50%) of 16 with idiopathic myelofibrosis. The mutation is acquired, is present in a variable proportion of granulocytes, alters a highly conserved valine present in the negative regulatory JH2 domain, and is predicted to dysregulate kinase activity. It was heterozygous in most patients, homozygous in a subset as a result of mitotic recombination, and arose in a multipotent progenitor capable of giving rise to erythroid and myeloid cells. The mutation was present in all erythropoietin-independent erythroid colonies.
Interpretation: A single acquired mutation of JAK2 was noted in more than half of patients with a myeloproliferative disorder. Its presence in all erythropoietin-independent erythroid colonies demonstrates a link with growth factor hypersensitivity, a key biological feature of these disorders.
Relevance to practice: Identification of the Val617Phe JAK2 mutation lays the foundation for new approaches to the diagnosis, classification, and treatment of myeloproliferative disorders.
Funded by: Wellcome Trust: 088340
Lancet (London, England) 2005;365;9464;1054-61
The genome of the African trypanosome Trypanosoma brucei.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. email@example.com
African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
Funded by: NIAID NIH HHS: AI43062, R01 AI043062, U01 AI043062; Wellcome Trust
Science (New York, N.Y.) 2005;309;5733;416-22
Multiple mutations in mouse Chd7 provide models for CHARGE syndrome.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Mouse ENU mutagenesis programmes have yielded a series of independent mutations on proximal chromosome 4 leading to dominant head-bobbing and circling behaviour due to truncations of the lateral semicircular canal of the inner ear. Here, we report the identification of mutations in the Chd7 gene in nine of these mutant alleles including six nonsense and three splice site mutations. The human CHD7 gene is known to be involved in CHARGE syndrome, which also shows inner ear malformations and a variety of other features with varying penetrance and appears to be due to frequent de novo mutation. We found widespread expression of Chd7 in early development of the mouse in organs affected in CHARGE syndrome including eye, olfactory epithelium, inner ear and vascular system. Closer inspection of heterozygous mutant mice revealed a range of defects with reduced penetrance, such as cleft palate, choanal atresia, septal defects of the heart, haemorrhages, prenatal death, vulva and clitoral defects and keratoconjunctivitis sicca. Many of these defects mimic the features of CHARGE syndrome. There were no obvious features of the gene that might make it more mutable than other genes. We conclude that the large number of mouse mutants and human de novo mutations may be due to the combination of the Chd7 gene being a large target and the fact that many heterozygous carriers of the mutations are viable individuals with a readily detectable phenotype.
Funded by: Wellcome Trust
Human molecular genetics 2005;14;22;3463-76
Plasmodium falciparum variant surface antigen expression patterns during malaria.
Nuffield Department of Clinical Medicine, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom. firstname.lastname@example.org
The variant surface antigens expressed on Plasmodium falciparum-infected erythrocytes are potentially important targets of immunity to malaria and are encoded, at least in part, by a family of var genes, about 60 of which are present within every parasite genome. Here we use semi-conserved regions within short var gene sequence "tags" to make direct comparisons of var gene expression in 12 clinical parasite isolates from Kenyan children. A total of 1,746 var clones were sequenced from genomic and cDNA and assigned to one of six sequence groups using specific sequence features. The results show the following. (1) The relative numbers of genomic clones falling in each of the sequence groups was similar between parasite isolates and corresponded well with the numbers of genes found in the genome of a single, fully sequenced parasite isolate. In contrast, the relative numbers of cDNA clones falling in each group varied considerably between isolates. (2) Expression of sequences belonging to a relatively conserved group was negatively associated with the repertoire of variant surface antigen antibodies carried by the infected child at the time of disease, whereas expression of sequences belonging to another group was associated with the parasite "rosetting" phenotype, a well established virulence determinant. Our results suggest that information on the state of the host-parasite relationship in vivo can be provided by measurements of the differential expression of different var groups, and need only be defined by short stretches of sequence data.
Funded by: Wellcome Trust: 060678, 631342
PLoS pathogens 2005;1;3;e26
The transcriptional landscape of the mammalian genome.
This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Funded by: Telethon: TGM03P17, TGM06S01
Science (New York, N.Y.) 2005;309;5740;1559-63
ACT: the Artemis Comparison Tool.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
The Artemis Comparison Tool (ACT) allows an interactive visualisation of comparisons between complete genome sequences and associated annotations. The comparison data can be generated with several different programs; BLASTN, TBLASTX or Mummer comparisons between genomic DNA sequences, or orthologue tables generated by reciprocal FASTA comparison between protein sets. It is possible to identify regions of similarity, insertions and rearrangements at any level from the whole genome to base-pair differences. ACT uses Artemis components to display the sequences and so inherits powerful searching and analysis tools. ACT is part of the Artemis distribution and is similarly open source, written in Java and can run on any Java enabled platform, including UNIX, Macintosh and Windows.
Bioinformatics (Oxford, England) 2005;21;16;3422-3
Nature reviews. Microbiology 2005;3;5;368-9
Extensive DNA inversions in the B. fragilis genome control variable gene expression.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
The obligately anaerobic bacterium Bacteroides fragilis, an opportunistic pathogen and inhabitant of the normal human colonic microbiota, exhibits considerable within-strain phase and antigenic variation of surface components. The complete genome sequence has revealed an unusual breadth (in number and in effect) of DNA inversion events that potentially control expression of many different components, including surface and secreted components, regulatory molecules, and restriction-modification proteins. Invertible promoters of two different types (12 group 1 and 11 group 2) were identified. One group has inversion crossover (fix) sites similar to the hix sites of Salmonella typhimurium. There are also four independent intergenic shufflons that potentially alter the expression and function of varied genes. The composition of the 10 different polysaccharide biosynthesis gene clusters identified (7 with associated invertible promoters) suggests a mechanism of synthesis similar to the O-antigen capsules of Escherichia coli.
Science (New York, N.Y.) 2005;307;5714;1463-5
Chromosome evolution in eukaryotes: a multi-kingdom perspective.
Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland.
In eukaryotes, chromosomal rearrangements, such as inversions, translocations and duplications, are common and range from part of a gene to hundreds of genes. Lineage-specific patterns are also seen: translocations are rare in dipteran flies, and angiosperm genomes seem prone to polyploidization. In most eukaryotes, there is a strong association between rearrangement breakpoints and repeat sequences. Current data suggest that some repeats promoted rearrangements via non-allelic homologous recombination, for others the association might not be causal but reflects the instability of particular genomic regions. Rearrangement polymorphisms in eukaryotes are correlated with phenotypic differences, so are thought to confer varying fitness in different habitats. Some seem to be under positive selection because they either trap favorable allele combinations together or alter the expression of nearby genes. There is little evidence that chromosomal rearrangements cause speciation, but they probably intensify reproductive isolation between species that have formed by another route.
Funded by: NHGRI NIH HHS: HG02639; NIGMS NIH HHS: GM58815; Wellcome Trust
Trends in genetics : TIG 2005;21;12;673-82
Proteomic analysis of in vivo phosphorylated synaptic proteins.
Division of Neuroscience, University of Edinburgh, Edinburgh EH8 9JZ, UK.
In the nervous system, protein phosphorylation is an essential feature of synaptic function. Although protein phosphorylation is known to be important for many synaptic processes and in disease, little is known about global phosphorylation of synaptic proteins. Heterogeneity and low abundance make protein phosphorylation analysis difficult, particularly for mammalian tissue samples. Using a new approach, combining both protein and peptide immobilized metal affinity chromatography and mass spectrometry data acquisition strategies, we have produced the first large scale map of the mouse synapse phosphoproteome. We report over 650 phosphorylation events corresponding to 331 sites (289 have been unambiguously assigned), 92% of which are novel. These represent 79 proteins, half of which are novel phosphoproteins, and include several highly phosphorylated proteins such as MAP1B (33 sites) and Bassoon (30 sites). An additional 149 candidate phosphoproteins were identified by profiling the composition of the protein immobilized metal affinity chromatography enrichment. All major synaptic protein classes were observed, including components of important pre- and postsynaptic complexes as well as low abundance signaling proteins. Bioinformatic and in vitro phosphorylation assays of peptide arrays suggest that a small number of kinases phosphorylate many proteins and that each substrate is phosphorylated by many kinases. These data substantially increase existing knowledge of synapse protein phosphorylation and support a model where the synapse phosphoproteome is functionally organized into a highly interconnected signaling network.
The Journal of biological chemistry 2005;280;7;5972-82
The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number.
Department of Veterinary Tropical Diseases, Faculty of Veterinary Science, University of Pretoria, Private Bag X04, Onderstepoort 0110, South Africa.
Heartwater, a tick-borne disease of domestic and wild ruminants, is caused by the intracellular rickettsia Ehrlichia ruminantium (previously known as Cowdria ruminantium). It is a major constraint to livestock production throughout subSaharan Africa, and it threatens to invade the Americas, yet there is no immediate prospect of an effective vaccine. A shotgun genome sequencing project was undertaken in the expectation that access to the complete protein coding repertoire of the organism will facilitate the search for vaccine candidate genes. We report here the complete 1,516,355-bp sequence of the type strain, the stock derived from the South African Welgevonden isolate. Only 62% of the genome is predicted to be coding sequence, encoding 888 proteins and 41 stable RNA species. The most striking feature is the large number of tandemly repeated and duplicated sequences, some of continuously variable copy number, which contributes to the low proportion of coding sequence. These repeats have mediated numerous translocation and inversion events that have resulted in the duplication and truncation of some genes and have also given rise to new genes. There are 32 predicted pseudogenes, most of which are truncated fragments of genes associated with repeats. Rather then being the result of the reductive evolution seen in other intracellular bacteria, these pseudogenes appear to be the product of ongoing sequence duplication events.
Funded by: NIAID NIH HHS: R01 AI047885, R01 AI47885
Proceedings of the National Academy of Sciences of the United States of America 2005;102;3;838-43
Diversity at every level.
Nature reviews. Microbiology 2005;3;3;196-7
The immunoglobulin heavy-chain locus in zebrafish: identification and expression of a previously unknown isotype, immunoglobulin Z.
Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA.
The only immunoglobulin heavy-chain classes known so far in teleosts have been mu and delta. We identify here a previously unknown class, immunoglobulin zeta, expressed in zebrafish and other teleosts. In the zebrafish heavy-chain locus, variable (V) gene segments lie upstream of two tandem diversity, joining and constant (DJC) clusters, resembling the mouse T cell receptor alpha (Tcra) and delta (Tcrd) locus. V genes rearrange to (DJC)(zeta) or to (DJC)(mu) without evidence of switch rearrangement. The zebrafish immunoglobulin zeta gene (ighz) and mouse Tcrd, which are proximal to the V gene array, are expressed earlier in development. In adults, ighz was expressed only in kidney and thymus, which are primary lymphoid organs in teleosts. This additional class adds complexity to the immunoglobulin repertoire and raises questions concerning the evolution of immunoglobulins and the regulation of the differential expression of ighz and ighm.
Funded by: NIAID NIH HHS: R01 AI08054
Nature immunology 2005;6;3;295-302
Somatic mutations of the protein kinase gene family in human lung cancer.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, United Kingdom.
Protein kinases are frequently mutated in human cancer and inhibitors of mutant protein kinases have proven to be effective anticancer drugs. We screened the coding sequences of 518 protein kinases (approximately 1.3 Mb of DNA per sample) for somatic mutations in 26 primary lung neoplasms and seven lung cancer cell lines. One hundred eighty-eight somatic mutations were detected in 141 genes. Of these, 35 were synonymous (silent) changes. This result indicates that most of the 188 mutations were "passenger" mutations that are not causally implicated in oncogenesis. However, an excess of approximately 40 nonsynonymous substitutions compared with that expected by chance (P = 0.07) suggests that some nonsynonymous mutations have been selected and are contributing to oncogenesis. There was considerable variation between individual lung cancers in the number of mutations observed and no mutations were found in lung carcinoids. The mutational spectra of most lung cancers were characterized by a high proportion of C:G > A:T transversions, compatible with the mutagenic effects of tobacco carcinogens. However, one neuroendocrine cancer cell line had a distinctive mutational spectrum reminiscent of UV-induced DNA damage. The results suggest that several mutated protein kinases may be contributing to lung cancer development, but that mutations in each one are infrequent.
Funded by: Wellcome Trust
Cancer research 2005;65;17;7591-5
Genomic sequence of the class II region of the canine MHC: comparison with the MHC of other mammalian species.
Genetics Section, Animal Health Trust, Lanwades Park, Kentford, Newmarket, Suffolk CB8 7UU, UK. firstname.lastname@example.org
The domestic dog, Canis familiaris, is an excellent model species in which to study complex inherited diseases, having over 200 recognized breeds, each of which represents a closed gene pool. Overlapping canine genomic BAC clones were sequenced to obtain 711,521 bp of the canine classical and extended MHC class II regions. Analysis and annotation of this sequence reveals that it contains 45 loci, of which 29 are predicted to be functionally expressed. Comparison of the DLA class II sequence with those of the cat, human, and mouse highlights regions of syntenic conservation and species-specific gene rearrangement and duplication and gives an insight into the evolution of the DR region in the order Carnivora. Elucidation of functionally important dog class II genes and the identification of 23 microsatellite markers spanning this region will contribute significantly to the study of canine diseases that have an immune component.
Conserved non-genic sequences - an unexpected feature of mammalian genomes.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. email@example.com
Mammalian genomes contain highly conserved sequences that are not functionally transcribed. These sequences are single copy and comprise approximately 1-2% of the human genome. Evolutionary analysis strongly supports their functional conservation, although their potentially diverse, functional attributes remain unknown. It is likely that genomic variation in conserved non-genic sequences is associated with phenotypic variability and human disorders. So how might their function and contribution to human disorders be examined?
Nature reviews. Genetics 2005;6;2;151-7
Exon array CGH: detection of copy-number changes at the resolution of individual exons in the human genome.
Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
The development of high-throughput screening methods such as array-based comparative genome hybridization (array CGH) allows screening of the human genome for copy-number changes. Current array CGH strategies have limits of resolution that make detection of small (less than a few tens of kilobases) gains or losses of genomic DNA difficult to identify. We report here a significant improvement in the resolution of array CGH, with the development of an array platform that utilizes single-stranded DNA array elements to accurately measure copy-number changes of individual exons in the human genome. Using this technology, we screened 31 patient samples across an array containing a total of 162 exons for five disease genes and detected copy-number changes, ranging from whole-gene deletions and duplications to single-exon deletions and duplications, in 100% of the cases. Our data demonstrate that it is possible to screen the human genome for copy-number changes with array CGH at a resolution that is 2 orders of magnitude higher than that previously reported.
American journal of human genetics 2005;76;5;750-62
NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.
Wellcome Trust Sanger Institute, Hinxton Cambridge, CB10 1SA, UK. firstname.lastname@example.org
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.
Nucleic acids research 2005;33;5;1445-53
The genome of the social amoeba Dictyostelium discoideum.
Center for Biochemistry and Center for Molecular Medicine Cologne, University of Cologne, Joseph-Stelzmann-Str. 52, 50931 Cologne, Germany.
The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.
Funded by: NICHD NIH HHS: R01 HD035925
Comparative genomics of trypanosomatid parasitic protozoa.
Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. email@example.com
A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.
Funded by: NIAID NIH HHS: AI045039, AI45038, AI45061, R01 AI043062, U01 AI040599, U01 AI043062, U01 AI045038, U01 AI045039; Wellcome Trust
Science (New York, N.Y.) 2005;309;5733;404-9
Mutations of C-RAF are rare in human cancer because C-RAF has a low basal kinase activity compared with B-RAF.
The Institute of Cancer Research, Signal Transduction Team, Cancer Research UK Centre of Cell and Molecular Biology, London, United Kingdom.
The protein kinase B-RAF is mutated in approximately 8% of human cancers. Here we show that presumptive mutants of the closely related kinase, C-RAF, were detected in only 4 of 545 (0.7%) cancer cell lines. The activity of two of the mutated proteins is not significantly different from that of wild-type C-RAF and these variants may represent rare human polymorphisms. The basal and B-RAF-stimulated kinase activities of a third variant are unaltered but its activation by RAS is significantly reduced, suggesting that it may act in a dominant-negative manner to modulate pathway signaling. The fourth variant has elevated basal kinase activity and is hypersensitive to activation by RAS but does not transform mammalian cells. Furthermore, when we introduce the equivalent of the most common cancer mutation in B-RAF (V600E) into C-RAF, it only has a weak effect on kinase activity and does not convert C-RAF into an oncogene. This lack of activation occurs because C-RAF lacks a constitutive charge within a motif in the kinase domain called the N-region. This fundamental difference in RAF isoform regulation explains why B-RAF is frequently mutated in cancer whereas C-RAF mutations are rare.
Funded by: Wellcome Trust
Cancer research 2005;65;21;9719-26
Variation in the eNOS gene modifies the association between total energy expenditure and glucose intolerance.
National Institute of DiabetesDigestiveKidney Diseases, 1550 E. Indian School Rd., Phoenix, AZ 85014, USA. firstname.lastname@example.org
Endothelium-derived nitric oxide (NO) facilitates skeletal muscle glucose uptake. Energy expenditure induces the endothelial NO synthase (eNOS) gene, providing a mechanism for insulin-independent glucose disposal. The object was to test 1) the association of genetic variation in eNOS, as assessed by haplotype-tagging single nucleotide polymorphisms (htSNPs) with type 2 diabetes, and 2) the interaction between eNOS haplotypes and total energy expenditure on glucose intolerance. Using multivariate models, we tested associations between eNOS htSNPs and diabetes (n = 461 and 474 case and control subjects, respectively) and glucose intolerance (two cohorts of n = 706 and 738 U.K. and Spanish Caucasians, respectively), and we tested eNOS x total energy expenditure interactions on glucose intolerance. An overall association between eNOS haplotype and diabetes was observed (P = 0.004). Relative to the most common haplotype (111), two haplotypes (121 and 212) tended to increase diabetes risk (OR 1.22, 95% CI 0.96-1.55), and one (122) was associated with decreased risk (0.58, 0.39-0.86). In the cohort studies, no association was observed between haplotypes and 2-h glucose (P > 0.10). However, we observed a significant total energy expenditure-haplotype interaction (P = 0.007). Genetic variation at the eNOS locus is associated with diabetes, which may be attributable to an enhanced effect of total energy expenditure on glucose disposal in individuals with specific eNOS haplotypes. Gene-environment interactions such as this may help explain why replication of genetic association frequently fails.
Funded by: Wellcome Trust
The molecular clock mediates leptin-regulated bone formation.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030; Bone Disease Program of Texas, Baylor College of Medicine, Houston, Texas 77030, USA.
The hormone leptin is a regulator of bone remodeling, a homeostatic function maintaining bone mass constant. Mice lacking molecular-clock components (Per and Cry), or lacking Per genes in osteoblasts, display high bone mass, suggesting that bone remodeling may also be subject to circadian regulation. Moreover, Per-deficient mice experience a paradoxical increase in bone mass following leptin intracerebroventricular infusion. Thus, clock genes may mediate the leptin-dependent sympathetic regulation of bone formation. We show that expression of clock genes in osteoblasts is regulated by the sympathetic nervous system and leptin. Clock genes mediate the antiproliferative function of sympathetic signaling by inhibiting G1 cyclin expression. Partially antagonizing this inhibitory loop, leptin also upregulates AP-1 gene expression, which promotes cyclin D1 expression, osteoblast proliferation, and bone formation. Thus, leptin determines the extent of bone formation by modulating, via sympathetic signaling, osteoblast proliferation through two antagonistic pathways, one of which involves the molecular clock.
Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae.
The Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, Massachusetts 02142, USA.
The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.
Funded by: Biotechnology and Biological Sciences Research Council: CFB17726; NIGMS NIH HHS: R01 GM058529
Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes.
Institute for Genomic Research (TIGR), 9712 Medical Center Drive, Rockville, MD 20850, USA. email@example.com
We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.
Science (New York, N.Y.) 2005;309;5731;134-7
MicroRNAs regulate brain morphogenesis in zebrafish.
Developmental Genetics Program, Skirball Institute of Biomolecular Medicine and Department of Cell Biology, New York University School of Medicine, New York, NY 10016, USA. firstname.lastname@example.org
MicroRNAs (miRNAs) are small RNAs that regulate gene expression posttranscriptionally. To block all miRNA formation in zebrafish, we generated maternal-zygotic dicer (MZdicer) mutants that disrupt the Dicer ribonuclease III and double-stranded RNA-binding domains. Mutant embryos do not process precursor miRNAs into mature miRNAs, but injection of preprocessed miRNAs restores gene silencing, indicating that the disrupted domains are dispensable for later steps in silencing. MZdicer mutants undergo axis formation and differentiate multiple cell types but display abnormal morphogenesis during gastrulation, brain formation, somitogenesis, and heart development. Injection of miR-430 miRNAs rescues the brain defects in MZdicer mutants, revealing essential roles for miRNAs during morphogenesis.
Science (New York, N.Y.) 2005;308;5723;833-8
AMPA receptor trafficking and GluR1.
Science (New York, N.Y.) 2005;310;5746;234-5; author reply 234-5
Rfam: annotating non-coding RNAs in complete genomes.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/.
Nucleic acids research 2005;33;Database issue;D121-4
A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses.
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.
Science (New York, N.Y.) 2005;307;5706;82-6
Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping.
CNRS, UMR 6061, Génétique et développement, Faculte de Médecine, Rennes, France.
Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences lack long-range continuity and provide only a fragmented view of a genome. Here we show the usefulness of combining survey sequencing with dense radiation-hybrid (RH) maps for extracting maximum comparative genome information from model organisms. Based on results from the canine system, we propose that from now on all low-pass sequencing projects should be accompanied by a dense, gene-based RH map-construction effort to extract maximum information from the genome with a marginal extra cost.
Nature reviews. Genetics 2005;6;8;643-8
Food for thought.
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nature reviews. Microbiology 2005;3;12;912-3
Nature reviews. Microbiology 2005;3;10;748-9
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution.
Nucleic acids research 2005;33;Database issue;D447-53
How homologous recombination generates a mutable genome.
Wellcome Trust Sanger Institute, Genome Campus, Cambridge, CB10 1SA, UK. email@example.com
Recombination and mutation have traditionally been regarded as independent evolutionary processes: the latter generates variation, which the former reshuffles. Recent studies, however, have suggested that allelic recombination influences the underlying mutation rate, as high mutation rates are inferred in regions of high recombination. Furthermore, recombination between duplicated sequences introduces structural variation into the human genome and facilitates the formation of clustered gene families. Comparisons of whole-genome sequences reveal the expansion of gene family clusters to be an important mode of genome evolution. The negative aspect of this genomic dynamism is the contribution of these rearrangements to genetic diseases.
Funded by: Wellcome Trust
Human genomics 2005;2;3;179-86
The dual origin of the Malagasy in Island Southeast Asia and East Africa: evidence from maternal and paternal lineages.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom. firstname.lastname@example.org
Linguistic and archaeological evidence about the origins of the Malagasy, the indigenous peoples of Madagascar, points to mixed African and Indonesian ancestry. By contrast, genetic evidence about the origins of the Malagasy has hitherto remained partial and imprecise. We defined 26 Y-chromosomal lineages by typing 44 Y-chromosomal polymorphisms in 362 males from four different ethnic groups from Madagascar and 10 potential ancestral populations in Island Southeast Asia and the Pacific. We also compared mitochondrial sequence diversity in the Malagasy with a manually curated database of 19,371 hypervariable segment I sequences, incorporating both published and unpublished data. We could attribute every maternal and paternal lineage found in the Malagasy to a likely geographic origin. Here, we demonstrate approximately equal African and Indonesian contributions to both paternal and maternal Malagasy lineages. The most likely origin of the Asia-derived paternal lineages found in the Malagasy is Borneo. This agrees strikingly with the linguistic evidence that the languages spoken around the Barito River in southern Borneo are the closest extant relatives of Malagasy languages. As a result of their equally balanced admixed ancestry, the Malagasy may represent an ideal population in which to identify loci underlying complex traits of both anthropological and medical interest.
Funded by: Wellcome Trust: 057559
American journal of human genetics 2005;76;5;894-901
A haplotype map of the human genome.
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.
Funded by: NHGRI NIH HHS: R01 HG001720, R01 HG001720-06
The genome of the kinetoplastid parasite, Leishmania major.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. email@example.com
Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.
Funded by: NIAID NIH HHS: R01 AI040599, R01 AI053667; Wellcome Trust
Science (New York, N.Y.) 2005;309;5733;436-42
Nodulation signaling in legumes requires NSP2, a member of the GRAS family of transcriptional regulators.
Departments of Disease and Stress Biology and Molecular Microbiology, John Innes Centre, Norwich NR4 7UH, UK.
Rhizobial bacteria enter a symbiotic interaction with legumes, activating diverse responses in roots through the lipochito oligosaccharide signaling molecule Nod factor. Here, we show that NSP2 from Medicago truncatula encodes a GRAS protein essential for Nod-factor signaling. NSP2 functions downstream of Nod-factor-induced calcium spiking and a calcium/calmodulin-dependent protein kinase. We show that NSP2-GFP expressed from a constitutive promoter is localized to the endoplasmic reticulum/nuclear envelope and relocalizes to the nucleus after Nod-factor elicitation. This work provides evidence that a GRAS protein transduces calcium signals in plants and provides a possible regulator of Nod-factor-inducible gene expression.
Science (New York, N.Y.) 2005;308;5729;1786-9
Sox2 is required for sensory organ development in the mammalian inner ear.
MRC Institute of Hearing Research, University of Nottingham, Nottingham NG7 2RD, UK.
Sensory hair cells and their associated non-sensory supporting cells in the inner ear are fundamental for hearing and balance. They arise from a common progenitor, but little is known about the molecular events specifying this cell lineage. We recently identified two allelic mouse mutants, light coat and circling (Lcc) and yellow submarine (Ysb), that show hearing and balance impairment. Lcc/Lcc mice are completely deaf, whereas Ysb/Ysb mice are severely hearing impaired. We report here that inner ears of Lcc/Lcc mice fail to establish a prosensory domain and neither hair cells nor supporting cells differentiate, resulting in a severe inner ear malformation, whereas the sensory epithelium of Ysb/Ysb mice shows abnormal development with disorganized and fewer hair cells. These phenotypes are due to the absence (in Lcc mutants) or reduced expression (in Ysb mutants) of the transcription factor SOX2, specifically within the developing inner ear. SOX2 continues to be expressed in the inner ears of mice lacking Math1 (also known as Atoh1 and HATH1), a gene essential for hair cell differentiation, whereas Math1 expression is absent in Lcc mutants, suggesting that Sox2 acts upstream of Math1.
Funded by: Medical Research Council: MC_U117562207
Myosin VI is required for normal retinal function.
Department of Pharmacology, UCSD School of Medicine, La Jolla, CA 92093-0912, USA.
Different unconventional myosins have been shown to play important roles in sensory function, including vision. We investigated the role of myosin VI by examining the retinas of mice carrying a null mutation in the myosin VI gene. Myosin VI was found to be present in the photoreceptor and RPE cells of normal retinas. In the absence of myosin VI, the amplitudes of the a- and b-waves of the electroretinogram were reduced, although there was not photoreceptor cell loss and retinal anatomy appeared normal. Our results indicate that myosin VI is required in photoreceptor cells for normal retinal electrophysiology.
Funded by: NEI NIH HHS: EY12598
Experimental eye research 2005;81;1;116-20
The complete genome sequence of Francisella tularensis, the causative agent of tularemia.
Swedish Defence Research Agency, SE-901 82 Umeå, Sweden.
Francisella tularensis is one of the most infectious human pathogens known. In the past, both the former Soviet Union and the US had programs to develop weapons containing the bacterium. We report the complete genome sequence of a highly virulent isolate of F. tularensis (1,892,819 bp). The sequence uncovers previously uncharacterized genes encoding type IV pili, a surface polysaccharide and iron-acquisition systems. Several virulence-associated genes were located in a putative pathogenicity island, which was duplicated in the genome. More than 10% of the putative coding sequences contained insertion-deletion or substitution mutations and seemed to be deteriorating. The genome is rich in IS elements, including IS630 Tc-1 mariner family transposons, which are not expected in a prokaryote. We used a computational method for predicting metabolic pathways and found an unexpectedly high proportion of disrupted pathways, explaining the fastidious nutritional requirements of the bacterium. The loss of biosynthetic pathways indicates that F. tularensis is an obligate host-dependent bacterium in its natural life cycle. Our results have implications for our understanding of how highly virulent human pathogens evolve and will expedite strategies to combat them.
Nature genetics 2005;37;2;153-9
Genome sequence, comparative analysis and haplotype structure of the domestic dog.
Broad Institute of Harvard and MIT, 320 Charles Street, Cambridge, Massachusetts 02141, USA. firstname.lastname@example.org
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Funded by: Intramural NIH HHS
Shotgun haplotyping: a novel method for surveying allelic sequence variation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Haplotypic sequences contain significantly more information than genotypes of genetic markers and are critical for studying disease association and genome evolution. Current methods for obtaining haplotypic sequences require the physical separation of alleles before sequencing, are time consuming and are not scaleable for large surveys of genetic variation. We have developed a novel method for acquiring haplotypic sequences from long PCR products using simple, high-throughput techniques. This method applies modified shotgun sequencing protocols to sequence both alleles concurrently, with read-pair information allowing the two alleles to be separated during sequence assembly. Although the haplotypic sequences can be assembled manually from the resultant data using pre-existing sequence assembly software, we have devised a novel heuristic algorithm to automate assembly and remove human error. We validated the approach on two long PCR products amplified from the human genome and confirmed the accuracy of our sequences against full-length clones of the same alleles. This method presents a simple high-throughput means to obtain full haplotypic sequences potentially up to 20 kb in length and is suitable for surveying genetic variation even in poorly-characterized genomes as it requires no prior information on sequence variation.
Funded by: Wellcome Trust
Nucleic acids research 2005;33;18;e152
The genome of the protist parasite Entamoeba histolytica.
TIGR, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. email@example.com
Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.
VEGA, the genome browser with a difference.
HAVANA Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK. firstname.lastname@example.org
The Vertebrate Genome Annotation (Vega) database is a community resource for browsing manual annotation from a variety of vertebrate genomes of finished sequence (http://vega.sanger.ac.uk). Vega is different from other genome browsers as it has a standardised classification of genes which encompasses pseudogenes and non-coding transcripts. The data is manually curated, which is more accurate at identifying splice variants, pseudogenes poly(A) features, non-coding and complex gene structures and arrangements than current automated methods. The database also contains annotation from regions, not just whole genomes, and displays multiple species annotation (human, mouse, dog and zebrafish) for comparative analysis. Vega encourages community feedback that results in annotation updates and manual annotation of finished vertebrate sequence.
Briefings in bioinformatics 2005;6;2;189-93
PPARGC1A genotype (Gly482Ser) predicts exceptional endurance capacity in European men.
European University of Madrid, Spain.
Animal and human data indicate a role for the peroxisome proliferator-activated receptor-gamma coactivator 1alpha (PPARGC1A) gene product in the development of maximal oxygen uptake (V(O2 max)), a determinant of endurance capacity, diabetes, and early death. We tested the hypothesis that the frequency of the minor Ser482 allele at the PPARGC1A locus is lower in World-class Spanish male endurance athletes (cases) [n = 104; mean (SD) age: 26.8 (3.8) yr] than in unfit United Kingdom (UK) Caucasian male controls [n = 100; mean (SD) age: 49.3 (8.1) yr]. In cases and controls, the Gly482Ser genotype met Hardy-Weinberg expectations (P > 0.05 in both groups tested separately). Cases had significantly higher V(O2 max) [73.4 (5.7) vs. 29.4 ml x kg(-1) x min(-1) (3.8); P < 0.0001] and were leaner [body mass index: 20.6 (1.5) vs. 27.6 kg/m2 (3.9); P < 0.0001] than controls. In unadjusted chi2 analyses, the frequency of the minor Ser482 allele was significantly lower in cases than in controls (29.1 vs. 40.0%; P = 0.01). To assess the possibility that genetic stratification could confound these observations, we also compared Gly482Ser genotype frequencies in Spanish (n = 164) and UK Caucasian men (n = 381) who were unselected for their level of fitness. In these analyses, Ser482 allele frequencies were very similar (36.9% in Spanish vs. 37.5% in UK Caucasians, P = 0.83), suggesting that confounding by genetic stratification is unlikely to explain the association between Gly482Ser genotype and endurance capacity. In summary, our data indicate a role for the Gly482Ser genotype in determining aerobic fitness. This finding has relevance from the perspective of physical performance, but it may also be informative for the targeted prevention of diseases associated with low fitness such as Type 2 diabetes.
Funded by: Wellcome Trust
Journal of applied physiology (Bethesda, Md. : 1985) 2005;99;1;344-8
Critical assessment of methods of protein structure prediction (CASP)--round 6.
Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA. email@example.com
This article is an introduction to the special issue of the journal Proteins, dedicated to the sixth CASP experiment to assess the state of the art in protein structure prediction. The article describes the conduct of the experiment and the categories of prediction included, and outlines the evaluation and assessment procedures. A brief summary of progress over the decade of CASP experiments is also provided.
Funded by: NIGMS NIH HHS: GM072354; NLM NIH HHS: LM07085
Proteins 2005;61 Suppl 7;3-7
Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus.
The Institute for Genomic Research, Rockville, Maryland 20850, USA. firstname.lastname@example.org
Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.
Funded by: Biotechnology and Biological Sciences Research Council: CFB17726; Wellcome Trust
Genetic factors in type 2 diabetes: the end of the beginning?
University of Cambridge, Department of Clinical Biochemistry, Addenbrooke's Hospital, Cambridge CB2 2QQ, UK. email@example.com
The intensive search for genetic variants that predispose to type 2 diabetes was launched with optimism, but progress has been slower than was hoped. Even so, major advances have been made in the understanding of monogenic forms of the disease which together represent a substantial health burden, and a few common gene variants that influence susceptibility have now been unequivocally identified. Armed with a better understanding of the tools needed to detect such genes, it seems inevitable that the rate of progress will increase and the relevance of genetic information to the diagnosis, treatment, and prevention of diabetes will become increasingly tangible.
Science (New York, N.Y.) 2005;307;5708;370-3
Fungi behaving badly.
Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK. firstname.lastname@example.org
Nature reviews. Microbiology 2005;3;11;832-3
Comparative apicomplexan genomics.
Nature reviews. Microbiology 2005;3;6;454-5
Genome of the host-cell transforming parasite Theileria annulata compared with T. parva.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. email@example.com
Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.
Funded by: Wellcome Trust
Science (New York, N.Y.) 2005;309;5731;131-3
The nuclear rim protein Amo1 is required for proper microtubule cytoskeleton organisation in fission yeast.
Cell Cycle Laboratory, Cancer Research UK, 44 Lincoln's Inn Fields, London, WC2A 3PX, UK. firstname.lastname@example.org
Microtubules have a central role in cell division and cell polarity in eukaryotic cells. The fission yeast is a useful organism for studying microtubule regulation owing to the highly organised nature of its microtubular arrays. To better understand microtubule dynamics and organisation we carried out a screen that identified over 30 genes whose overexpression resulted in microtubule cytoskeleton abnormalities. Here we describe a novel nucleoporin-like protein, Amo1, identified in this screen. Amo1 localises to the nuclear rim in a punctate pattern that does not overlap with nuclear pore complex components. Amo1Delta cells are bent, and they have fewer microtubule bundles that curl around the cell ends. The microtubules in amo1Delta cells have longer dwelling times at the cell tips, and grow in an uncoordinated fashion. Lack of Amo1 also causes a polarity defect. Amo1 is not required for the microtubule loading of several factors affecting microtubule dynamics, and does not seem to be required for nuclear pore function.
Journal of cell science 2005;118;Pt 8;1705-14
Adding some SPICE to DAS.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, UK. email@example.com
Unlabelled: The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is carried out by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations.
Bioinformatics (Oxford, England) 2005;21 Suppl 2;ii40-1
The DNA sequence of the human X chromosome.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans.
Cenix BioScience GmbH, Tatzberg 47-51, D-01307 Dresden, Germany. email@example.com
A key challenge of functional genomics today is to generate well-annotated data sets that can be interpreted across different platforms and technologies. Large-scale functional genomics data often fail to connect to standard experimental approaches of gene characterization in individual laboratories. Furthermore, a lack of universal annotation standards for phenotypic data sets makes it difficult to compare different screening approaches. Here we address this problem in a screen designed to identify all genes required for the first two rounds of cell division in the Caenorhabditis elegans embryo. We used RNA-mediated interference to target 98% of all genes predicted in the C. elegans genome in combination with differential interference contrast time-lapse microscopy. Through systematic annotation of the resulting movies, we developed a phenotypic profiling system, which shows high correlation with cellular processes and biochemical pathways, thus enabling us to predict new functions for previously uncharacterized genes.
Mutation in the transcriptional coactivator EYA4 causes dilated cardiomyopathy and sensorineural hearing loss.
Harvard Medical School, Department of Genetics, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA.
We identified a human mutation that causes dilated cardiomyopathy and heart failure preceded by sensorineural hearing loss (SNHL). Unlike previously described mutations causing dilated cardiomyopathy that affect structural proteins, this mutation deletes 4,846 bp of the human transcriptional coactivator gene EYA4. To elucidate the roles of eya4 in heart function, we studied zebrafish embryos injected with antisense morpholino oligonucleotides. Attenuated eya4 transcript levels produced morphologic and hemodynamic features of heart failure. To determine why previously described mutated EYA4 alleles cause SNHL without heart disease, we examined biochemical interactions of mutant Eya4 peptides. Eya4 peptides associated with SNHL, but not the shortened 193-amino acid peptide associated with dilated cardiomyopathy and SNHL, bound wild-type Eya4 and associated with Six proteins. These data define unrecognized and crucial roles for Eya4-Six-mediated transcriptional regulation in normal heart function.
Nature genetics 2005;37;4;418-22
Visualizing profile-profile alignment: pairwise HMM logos.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
Unlabelled: The availability of advanced profile-profile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected.
Availability: A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally.
Funded by: Wellcome Trust
Bioinformatics (Oxford, England) 2005;21;12;2912-3
Nature reviews. Microbiology 2005;3;4;278-9
Wolbachia variability and host effects on crossing type in Culex mosquitoes.
Department of Zoology, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK. email@example.com
Wolbachia is a common maternally inherited bacterial symbiont able to induce crossing sterilities known as cytoplasmic incompatibility (CI) in insects. Wolbachia-modified sperm are unable to complete fertilization of uninfected ova, but a rescue function allows infected eggs to develop normally. By providing a reproductive advantage to infected females, Wolbachia can rapidly invade uninfected populations, and this could provide a mechanism for driving transgenes through pest populations. CI can also occur between Wolbachia-infected populations and is usually associated with the presence of different Wolbachia strains. In the Culex pipiens mosquito group (including the filariasis vector C. quinquefasciatus) a very unusual degree of complexity of Wolbachia-induced crossing-types has been reported, with partial or complete CI that can be unidirectional or bidirectional, yet no Wolbachia strain variation was found. Here we show variation between incompatible Culex strains in two Wolbachia ankyrin repeat-encoding genes associated with a prophage region, one of which is sex-specifically expressed in some strains, and also a direct effect of the host nuclear genome on CI rescue.
Funded by: Wellcome Trust
Two ways to trap a gene in mice.
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom. firstname.lastname@example.org
Proceedings of the National Academy of Sciences of the United States of America 2005;102;37;13001-2
The new cytogenetics: blurring the boundaries with molecular biology.
Institut für Humangenetik, Technische Universität München, Germany. email@example.com
Exciting advances in fluorescence in situ hybridization and array-based techniques are changing the nature of cytogenetics, in both basic research and molecular diagnostics. Cytogenetic analysis now extends beyond the simple description of the chromosomal status of a genome and allows the study of fundamental biological questions, such as the nature of inherited syndromes, the genomic changes that are involved in tumorigenesis and the three-dimensional organization of the human genome. The high resolution that is achieved by these techniques, particularly by microarray technologies such as array comparative genomic hybridization, is blurring the traditional distinction between cytogenetics and molecular biology.
Funded by: Wellcome Trust
Nature reviews. Genetics 2005;6;10;782-92
Structure and function of the notochord: an essential organ for chordate development.
Vertebrate Development and Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. firstname.lastname@example.org
The notochord is the defining structure of the chordates, and has essential roles in vertebrate development. It serves as a source of midline signals that pattern surrounding tissues and as a major skeletal element of the developing embryo. Genetic and embryological studies over the past decade have informed us about the development and function of the notochord. In this review, I discuss the embryonic origin, signalling roles and ultimate fate of the notochord, with an emphasis on structural aspects of notochord biology.
Funded by: Wellcome Trust
Development (Cambridge, England) 2005;132;11;2503-12
A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
We examined the coding sequence of 518 protein kinases, approximately 1.3 Mb of DNA per sample, in 25 breast cancers. In many tumors, we detected no somatic mutations. But a few had numerous somatic mutations with distinctive patterns indicative of either a mutator phenotype or a past exposure.
Nature genetics 2005;37;6;590-2
Genome-wide associations of gene expression variation in humans.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Funded by: NHGRI NIH HHS: HG02790, HG03229, P50 HG002790, R01 HG003229; NIGMS NIH HHS: GM065509, P50 GM065509; Wellcome Trust
PLoS genetics 2005;1;6;e78
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Nature reviews. Microbiology 2005;3;8;586-7
Brothers in arms.
Nature reviews. Microbiology 2005;3;2;100-1
The Chlamydophila abortus genome sequence reveals an array of variable proteins that contribute to interspecies variation.
The Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. email@example.com
The obligate intracellular bacterial pathogen Chlamydophila abortus strain S26/3 (formerly the abortion subtype of Chlamydia psittaci) is an important cause of late gestation abortions in ruminants and pigs. Furthermore, although relatively rare, zoonotic infection can result in acute illness and miscarriage in pregnant women. The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae. The 1,144,377-bp genome contains 961 predicted coding sequences, 842 of which are conserved with those of Chlamydophila caviae and Chlamydophila pneumoniae. Within this conserved Cp. abortus core genome we have identified the major regions of variation and have focused our analysis on these loci, several of which were found to encode highly variable protein families, such as TMH/Inc and Pmp families, which are strong candidates for the source of diversity in host tropism and disease causation in this group of organisms. Significantly, Cp. abortus lacks any toxin genes, and also lacks genes involved in tryptophan metabolism and nucleotide salvaging (guaB is present as a pseudogene), suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species.
Genome research 2005;15;5;629-40
Don't mix radiocarbon and calendar years.
Funded by: Wellcome Trust: 057559
Null and conditional semaphorin 3B alleles using a flexible puroDeltatk loxP/FRT vector.
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.
In neural development, Semaphorin 3B (SEMA3B) is thought to play a role in guiding axons by repulsion. In nonneuronal tissue, SEMA3B has been postulated to be a tumor suppressor gene of lung and breast cancer. Much of the understanding of the function of members of the SEMA3 family has come from targeted deletion of these genes in mice (Sema3A, Sema3C, and Sema3F). Thus, targeted deletion of Sema3B in mice would prove invaluable in dissecting out its functions. To allow for maximum gene-targeting flexibility, we developed a generic targeting vector, pFlexible, containing the positive/negative selectable marker puroDeltatk and loxP and FRT recombination sites, and used it to target Sema3B in ES cells. Flpe- and Cre-mediated recombination in vitro generated ES cell lines that contained a conditional or null Sema3B allele, respectively, which were established as homozygous alleles in mice. Analysis of Sema3B null mice showed they were viable, fertile, and displayed no overt pathological abnormalities, suggesting an inherent correction mechanism or level of redundancy between the class 3 semaphorins. This targeting vector system has broad applicability in any knockout experiment and provides a flexible approach for the generation of modified alleles in mice.
Genesis (New York, N.Y. : 2000) 2005;41;4;171-8
The RASSF1A isoform of RASSF1 promotes microtubule stability and suppresses tumorigenesis.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
The RASSF1A isoform of RASSF1 is frequently inactivated by epigenetic alterations in human cancers, but it remains unclear if and how it acts as a tumor suppressor. RASSF1A overexpression reduces in vitro colony formation and the tumorigenicity of cancer cell lines in vivo. Conversely, RASSF1A knockdown causes multiple mitotic defects that may promote genomic instability. Here, we have used a genetic approach to address the function of RASSF1A as a tumor suppressor in vivo by targeted deletion of Rassf1A in the mouse. Rassf1A null mice were viable and fertile and displayed no pathological abnormalities. Rassf1A null embryonic fibroblasts displayed an increased sensitivity to microtubule depolymerizing agents. No overtly altered cell cycle parameters or aberrations in centrosome number were detected in Rassf1A null fibroblasts. Rassf1A null fibroblasts did not show increased sensitivity to microtubule poisons or DNA-damaging agents and showed no evidence of gross genomic instability, suggesting that cellular responses to genotoxins were unaffected. Rassf1A null mice showed an increased incidence of spontaneous tumorigenesis and decreased survival rate compared with wild-type mice. Irradiated Rassf1A null mice also showed increased tumor susceptibility, particularly to tumors associated with the gastrointestinal tract, compared with wild-type mice. Thus, our results demonstrate that Rassf1A acts as a tumor suppressor gene.
Funded by: Wellcome Trust
Molecular and cellular biology 2005;25;18;8356-67
Differentiating campomelic dysplasia from Cumming syndrome.
Funded by: NICHD NIH HHS: 5P01 HD 22657
American journal of medical genetics. Part A 2005;135;1;110-2
Validation of mRNA/EST-based gene predictions in human Xp11.4 revealed differences to the organization of the orthologous mouse locus.
Genome Analysis, Institute of Molecular Biotechnology, Beutenbergstr. 11, 07745, Jena, Germany.
Careful manual annotation of the human reference sequence provides a solid basis for the identification of disease-associated genes. Toward this end, we focused on a medically relevant 2.6-Mb region of the human chromosome Xp11.4 between markers DXS9851 and DXS9751 and identified 16 transcription units according to the Vertebrate Genome Annotation (Vega) rules. In order to validate these annotations, we performed a comprehensive RT-PCR expression analysis and a human-mouse comparison. This revealed, despite the high overall genomic conservation of the region, remarkable differences of the gene content between human and mouse. Whereas 12 of 16 annotations were confirmed by RT-PCR in human tissues, for only seven genes mouse orthologs could be identified and found to be expressed. This indicates that a comprehensive and experimentally supported annotation effort of the human genome simultaneously highlights regions with striking differences in gene organization to other species and may indicate evolutionary events specific to the human lineage demanding further functional analyses.
Mammalian genome : official journal of the International Mammalian Genome Society 2005;16;12;934-41
Complete genome sequence and lytic phase transcription profile of a Coccolithovirus.
Plymouth Marine Laboratory, Prospect Place, The Hoe, Plymouth, PL1 3DH, UK. firstname.lastname@example.org
The genus Coccolithovirus is a recently discovered group of viruses that infect the globally important marine calcifying microalga Emiliania huxleyi. Among the 472 predicted genes of the 407,339-base pair genome are a variety of unexpected genes, most notably those involved in biosynthesis of ceramide, a sphingolipid known to induce apoptosis. Uniquely for algal viruses, it also contains six RNA polymerase subunits and a novel promoter, suggesting this virus encodes its own transcription machinery. Microarray transcriptomic analysis reveals that 65% of the predicted virus-encoded genes are expressed during lytic infection of E. huxleyi.
Science (New York, N.Y.) 2005;309;5737;1090-2
Recent spread of a Y-chromosomal lineage in northern China and Mongolia.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
We have identified a Y-chromosomal lineage that is unusually frequent in northeastern China and Mongolia, in which a haplotype cluster defined by 15 Y short tandem repeats was carried by approximately 3.3% of the males sampled from East Asia. The most recent common ancestor of this lineage lived 590 +/- 340 years ago (mean +/- SD), and it was detected in Mongolians and six Chinese minority populations. We suggest that the lineage was spread by Qing Dynasty (1644-1912) nobility, who were a privileged elite sharing patrilineal descent from Giocangga (died 1582), the grandfather of Manchu leader Nurhaci, and whose documented members formed approximately 0.4% of the minority population by the end of the dynasty.
Funded by: Wellcome Trust
American journal of human genetics 2005;77;6;1112-6
ssahaSNP - a polymorphism detection tool on a whole genome scale
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE 2005;251 - 252
An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. email@example.com
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.
Nature genetics 2005;37;12;1320-2
Global hypomethylation of the genome in XX embryonic stem cells.
MRC Clinical Sciences Centre, ICFM, Hammersmith Hospital, DuCane Road, London, W12 ONN, UK.
Embryonic stem (ES) cells are important tools in the study of gene function and may also become important in cell therapy applications. Establishment of stable XX ES cell lines from mouse blastocysts is relatively problematic owing to frequent loss of one of the two X chromosomes. Here we show that DNA methylation is globally reduced in XX ES cell lines and that this is attributable to the presence of two active X chromosomes. Hypomethylation affects both repetitive and unique sequences, the latter including differentially methylated regions that regulate expression of parentally imprinted genes. Methylation of differentially methylated regions can be restored coincident with elimination of an X chromosome in early-passage parthenogenetic ES cells, suggesting that selection against loss of methylation may provide the basis for X-chromosome instability. Finally, we show that hypomethylation is associated with reduced levels of the de novo DNA methyltransferases Dnmt3a and Dnmt3b and that ectopic expression of these factors restores global methylation levels.
Nature genetics 2005;37;11;1274-9