Research
Our research portfolio encompasses all levels of proteome complexity: from whole organism characterisation to subcellular organelles, from protein complexes to post-translation modifications.
- Protein Interactions
- Post Translational Modifications
- Proteogenomics
- Proteome Characterisation and Quantification
Protein Interactions
We employ affinity purification (epitope tagging) and tandem mass spectrometry to characterise protein complexes and map protein interaction networks and their dynamics.
Selected Publications:
-
Assignment of protein interactions from affinity purification/mass spectrometry data.
Journal of proteome research 2012;11;3;1462-74
PUBMED: 22283744; DOI: 10.1021/pr2011632
-
Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.
Molecular cell 2011;43;3;406-17
PUBMED: 21816347; PMC: 3332305; DOI: 10.1016/j.molcel.2011.05.031
-
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Cell stem cell 2010;6;4;382-95
PUBMED: 20362542; PMC: 2860244; DOI: 10.1016/j.stem.2010.03.004
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
Post Translational Modifications
There are several levels at which we study protein modifications; the proteome level, the protein level and the modification level. We develop bioinformatics methods and analytical strategies for the identification of all detectable modifications. We also use enrichment techniques to target specific modifications such as phosphorylation for detailed analysis.
Selected Publications:
-
Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.
Cell host & microbe 2012;12;2;246-58
PUBMED: 22901544; PMC: 3501726; DOI: 10.1016/j.chom.2012.06.005
-
Neurotransmitters drive combinatorial multistate postsynaptic density networks.
Science signaling 2009;2;68;ra19
PUBMED: 19401593; PMC: 3280897; DOI: 10.1126/scisignal.2000102
-
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
PUBMED: 18388127; DOI: 10.1074/mcp.M700564-MCP200
-
Proteomic analysis of in vivo phosphorylated synaptic proteins.
The Journal of biological chemistry 2005;280;7;5972-82
PUBMED: 15572359; DOI: 10.1074/jbc.M411220200
Proteogenomics
Proteogenomics uses mass spectrometry data to experimentally validate gene products and to assist in the process of genome annotation and comparison. We develop tools and methods to facilitate use of proteomics data for this application.
Selected Publications:
-
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
Genome research 2011;21;5;756-67
PUBMED: 21460061; PMC: 3083093; DOI: 10.1101/gr.114272.110
-
Accurate and sensitive peptide identification with Mascot Percolator.
Journal of proteome research 2009;8;6;3176-81
PUBMED: 19338334; PMC: 2734080; DOI: 10.1021/pr800982s
-
Interrogating the human genome using uninterpreted mass spectrometry data.
Proteomics 2001;1;5;651-67
PUBMED: 11678035; DOI: 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO;2-N
Proteome Characterisation and Quantification
We are also constantly developing novel mass spectrometry and informatics techniques to improve protein identification and quantification. These include profiling the changes in protein expression in diseased organisms, analysis of protein localisation to subcellular organelles, studies examining protein synthesis and turn-over and absolute quantification of protein species.
Selected Publications:
-
A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.
Cell host & microbe 2012;12;1;9-19
PUBMED: 22817984; PMC: 3414820; DOI: 10.1016/j.chom.2012.05.014
-
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.
Microbiology (Reading, England) 2011;157;Pt 10;2922-32
PUBMED: 21816880; PMC: 3353397; DOI: 10.1099/mic.0.050278-0
-
Characterization of the proteome, diseases and evolution of the human postsynaptic density.
Nature neuroscience 2011;14;1;19-21
PUBMED: 21170055; PMC: 3040565; DOI: 10.1038/nn.2719
-
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Nature neuroscience 2008;11;7;799-806
PUBMED: 18536710; PMC: 3624047; DOI: 10.1038/nn.2135
Technology
[ Genome Research Limited ]
Technology and Instrumentation
We have a long-standing expertise in sample preparation, peptide and protein separation and purification technologies, mass spectrometry and proteomics data analysis. Our well-equipped laboratory has a range of state-of-the-art high-resolution mass spectrometers that we combine with innovative tools and software to precisely identify and quantify proteins and their modifications in the proteome.
Informatics
- Mascot Percolator - Allows accurate and sensitive peptide identification from low- and high-accuracy mass spectrometry data.
- SloMo - We have adapted the original SLoMo tool for fast high throughput modification site localisation.
- ModX - A toolbox for the detection and validation of protein modifications.
Methods
eTAP-MS (endogenous tandem affinity purification – mass spectrometry)
In conjunction with the mouse research teams at the Sanger Institute, we have developed a technology that enables protein interactions in cell lines and tissues to be mapped both at the large-scale systematic level and to individual genes of interest. The approach uses two affinity tags to isolate protein assemblies associated with a specific gene. By identifying and analysing these protein assemblies, the genes biological function in cellular processes or signalling pathways is revealed.
Selected Method and Technology Publications:
-
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Molecular & cellular proteomics : MCP 2012;11;8;478-91
PUBMED: 22493177; PMC: 3412976; DOI: 10.1074/mcp.O111.014522
-
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Cell stem cell 2010;6;4;382-95
PUBMED: 20362542; PMC: 2860244; DOI: 10.1016/j.stem.2010.03.004
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
Collaborations
Collaborating with Proteomic Mass Spectrometry Group
We work closely with research teams across the Sanger Institute research programmes and with many organisations around the world.
If you are interested in collaborating with us please contact Jyoti Choudhary.
External Collaborations
- Jonathon Pines Laboratory (Gurdon Institute, University of Cambridge)
- Genes2Cognition (Edinburgh University)
- Proteomics Services (European Bioinformatics Institute)
- Transfusion Medicine (Department of Haematology, University of Cambridge)
- Kaser Lab (Department of Medicine, University of Cambridge)
- Kall Lab (Science for Life Laboratory, Stockholm)
- Fankel Lab (Imperial College London)
- Grant Lab (Department of Veterinary Medicine, University of Cambridge)
Please see related projects section for internal Sanger Institute collaborations.
Opportunities
For career opportunities with our group please visit the Sanger careers pages.
We also welcome applications from self funded postdocs.
Software
Mascot percolator
Mascot Percolator allows accurate and sensitive peptide identification from low- and high-accuracy mass spectrometry data. It combines the database search algorithm Mascot with the machine-learning algorithm Percolator to accurately score results.
Turbo SloMo
We have adapted the original SLoMo tool for fast high throughput modification site localisation.
- - Currently in development -
ModX
ModX is a set of Perl scripts and libraries to automatically process the output of multiple PTM detection algorithms and validate detections using Mascot Percolator.
- - Currently in development -
Datasets
Downloadable Proteomics Datasets
EBI - PRIDE
- E.Coli Full Tryptic Digest CID
- E.Coli Full Tryptic Digest CID
- E.Coli Partial Tryptic Digest CID
- Human Universal Protein Standard CID Analysis
- E.Coli Full Tryptic Digest ETcaD
- E.Coli Partial Tryptic Digest ETcaD
- E.Coli Partial Tryptic Digest Sequential CID/ETcaD
- Human Universal Protein Standard ETcaD Analysis
- Acyl biotin exchange SILAC biological replicate 1, technical replicate 1
- Acyl biotin exchange SILAC biological replicate 1, technical replicate 2
- Acyl biotin exchange SILAC biological replicate 2, technical replicate 1
- Acyl biotin exchange SILAC biological replicate 2, technical replicate 2
-
Acyl biot
in exchange SILAC biological replicate 3, technical replicate 1 - Acyl biotin exchange SILAC biological replicate 3, technical replicate 2
- Click chemistry SILAC biological replicate 1, technical replicate 1
- Click chemistry SILAC biological replicate 1, technical replicate 2
- Click chemistry SILAC biological replicate 2, technical replicate 1
- Click chemistry SILAC biological replicate 2, technical replicate 2
- Mouse Brain Membrane
- Mouse ES Cell Nuclear Extract Gel Digest
- Mouse ES Cell Nuclear Extract In-Solution Digest
- Quantitative analysis of the Cyclin-A interactome during the cell cycle
- Quantitative analysis of the Cyclin-A interactome during the cell cycle
- Quantitative analysis of the Cyclin-A interactome during the cell cycle
- Control Exp II
- Oct4-FTAP Exp II
- Oct4-FTAP Exp III
- Control Exp I
- Oct4-FTAP Exp I
- Control Exp III
PeptideAtlas FTP
Training
Wellcome Trust Advanced Courses
- Protein Interactions and Networks (August 2013)
- Malaria Experimental Genetics (March 2013)
- Proteomics Bioinformatics (November 2012)
- Protein Interactions and Networks (August 2012)
References
2013 Publications
-
Mechanisms controlling the temporal degradation of Nek2A and Kif18A by the APC/C-Cdc20 complex.
The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
The Anaphase Promoting Complex/Cyclosome (APC/C) in complex with its co-activator Cdc20 is responsible for targeting proteins for ubiquitin-mediated degradation during mitosis. The activity of APC/C-Cdc20 is inhibited during prometaphase by the Spindle Assembly Checkpoint (SAC) yet certain substrates escape this inhibition. Nek2A degradation during prometaphase depends on direct binding of Nek2A to the APC/C via a C-terminal MR dipeptide but whether this motif alone is sufficient is not clear. Here, we identify Kif18A as a novel APC/C-Cdc20 substrate and show that Kif18A degradation depends on a C-terminal LR motif. However in contrast to Nek2A, Kif18A is not degraded until anaphase showing that additional mechanisms contribute to Nek2A degradation. We find that dimerization via the leucine zipper, in combination with the MR motif, is required for stable Nek2A binding to and ubiquitination by the APC/C. Nek2A and the mitotic checkpoint complex (MCC) have an overlap in APC/C subunit requirements for binding and we propose that Nek2A binds with high affinity to apo-APC/C and is degraded by the pool of Cdc20 that avoids inhibition by the SAC.
Funded by: Wellcome Trust: 079643/Z/06/Z
The EMBO journal 2013;32;2;303-14
PUBMED: 23288039; PMC: 3553385; DOI: 10.1038/emboj.2012.335
2012 Publications
-
The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.
Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA
Molecular & cellular proteomics : MCP 2012;11;12;1682-9
PUBMED: 22949509; PMC: 3518121; DOI: 10.1074/mcp.O112.021543
-
Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.
Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.
Funded by: Wellcome Trust: 079643/Z/06/Z, 089084
Cell host & microbe 2012;12;2;246-58
PUBMED: 22901544; PMC: 3501726; DOI: 10.1016/j.chom.2012.06.005
-
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.
Funded by: Wellcome Trust: 079643/Z/06/Z
Molecular & cellular proteomics : MCP 2012;11;8;478-91
PUBMED: 22493177; PMC: 3412976; DOI: 10.1074/mcp.O111.014522
-
A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Calcium-dependent protein kinases (CDPKs) play key regulatory roles in the life cycle of the malaria parasite, but in many cases their precise molecular functions are unknown. Using the rodent malaria parasite Plasmodium berghei, we show that CDPK1, which is known to be essential in the asexual blood stage of the parasite, is expressed in all life stages and is indispensable during the sexual mosquito life-cycle stages. Knockdown of CDPK1 in sexual stages resulted in developmentally arrested parasites and prevented mosquito transmission, and these effects were independent of the previously proposed function for CDPK1 in regulating parasite motility. In-depth translational and transcriptional profiling of arrested parasites revealed that CDPK1 translationally activates mRNA species in the developing zygote that in macrogametes remain repressed via their 3' and 5'UTRs. These findings indicate that CDPK1 is a multifunctional protein that translationally regulates mRNAs to ensure timely and stage-specific protein expression.
Funded by: Medical Research Council: G0501670; Wellcome Trust: 079643/Z/06/Z, WT098051
Cell host & microbe 2012;12;1;9-19
PUBMED: 22817984; PMC: 3414820; DOI: 10.1016/j.chom.2012.05.014
-
Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.
Funded by: Cancer Research UK; Medical Research Council: G0600127; Wellcome Trust
The EMBO journal 2012;31;11;2486-97
PUBMED: 22510880; PMC: 3365428; DOI: 10.1038/emboj.2012.91
-
Assignment of protein interactions from affinity purification/mass spectrometry data.
Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. mp3@sanger.ac.uk
The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.
Funded by: Wellcome Trust: 079643/Z/06/Z
Journal of proteome research 2012;11;3;1462-74
PUBMED: 22283744; DOI: 10.1021/pr2011632
-
De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.
Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK. kirov@cardiff.ac.uk
A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.
Funded by: Medical Research Council: G0800509; NIMH NIH HHS: MH066392-05A1
Molecular psychiatry 2012;17;2;142-53
PUBMED: 22083728; PMC: 3603134; DOI: 10.1038/mp.2011.154
-
Comparative study of human and mouse postsynaptic proteomes finds high compositional conservation and abundance differences for key synaptic proteins.
Molecular Physiology of the Synapse Laboratory, Institut de Recerca de l'Hospital de la Santa Creu i Sant Pau, UAB, Barcelona, Catalonia, Spain. ABayesP@santpau.cat
Direct comparison of protein components from human and mouse excitatory synapses is important for determining the suitability of mice as models of human brain disease and to understand the evolution of the mammalian brain. The postsynaptic density is a highly complex set of proteins organized into molecular networks that play a central role in behavior and disease. We report the first direct comparison of the proteome of triplicate isolates of mouse and human cortical postsynaptic densities. The mouse postsynaptic density comprised 1556 proteins and the human one 1461. A large compositional overlap was observed; more than 70% of human postsynaptic density proteins were also observed in the mouse postsynaptic density. Quantitative analysis of postsynaptic density components in both species indicates a broadly similar profile of abundance but also shows that there is higher abundance variation between species than within species. Well known components of this synaptic structure are generally more abundant in the mouse postsynaptic density. Significant inter-species abundance differences exist in some families of key postsynaptic density proteins including glutamatergic neurotransmitter receptors and adaptor proteins. Furthermore, we have identified a closely interacting set of molecules enriched in the human postsynaptic density that could be involved in dendrite and spine structural plasticity. Understanding synapse proteome diversity within and between species will be important to further our understanding of brain complexity and disease.
Funded by: Medical Research Council; Wellcome Trust
PloS one 2012;7;10;e46683
PUBMED: 23071613; PMC: 3465276; DOI: 10.1371/journal.pone.0046683
-
SynGAP isoforms exert opposing effects on synaptic strength.
Centre for Integrative Physiology, University of Edinburgh, Edinburgh EH8 9XD, UK.
Alternative promoter usage and alternative splicing enable diversification of the transcriptome. Here we demonstrate that the function of Synaptic GTPase-Activating Protein (SynGAP), a key synaptic protein, is determined by the combination of its amino-terminal sequence with its carboxy-terminal sequence. 5' rapid amplification of cDNA ends and primer extension show that different N-terminal protein sequences arise through alternative promoter usage that are regulated by synaptic activity and postnatal age. Heterogeneity in C-terminal protein sequence arises through alternative splicing. Overexpression of SynGAP α1 versus α2 C-termini-containing proteins in hippocampal neurons has opposing effects on synaptic strength, decreasing and increasing miniature excitatory synaptic currents amplitude/frequency, respectively. The magnitude of this C-terminal-dependent effect is modulated by the N-terminal peptide sequence. This is the first demonstration that activity-dependent alternative promoter usage can change the function of a synaptic protein at excitatory synapses. Furthermore, the direction and degree of synaptic modulation exerted by different protein isoforms from a single gene locus is dependent on the combination of differential promoter usage and alternative splicing.
Funded by: Medical Research Council: G0902044(94018); Wellcome Trust
Nature communications 2012;3;900
PUBMED: 22692543; PMC: 3621422; DOI: 10.1038/ncomms1900
2011 Publications
-
Coordinating cell cycle progression via cyclin specificity.
Cell cycle (Georgetown, Tex.) 2011;10;24;4195-6
PUBMED: 22156915; DOI: 10.4161/cc.10.24.18395
-
APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.
The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.
Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z
Nature cell biology 2011;13;10;1234-43
PUBMED: 21926987; PMC: 3188299; DOI: 10.1038/ncb2347
-
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.
Funded by: Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z
Microbiology (Reading, England) 2011;157;Pt 10;2922-32
PUBMED: 21816880; PMC: 3353397; DOI: 10.1099/mic.0.050278-0
-
Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.
The Gurdon Institute, University of Cambridge, Cambridge, UK.
Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.
Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z
Molecular cell 2011;43;3;406-17
PUBMED: 21816347; PMC: 3332305; DOI: 10.1016/j.molcel.2011.05.031
-
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).
Funded by: Cancer Research UK; Wellcome Trust: 077198
Genome research 2011;21;5;756-67
PUBMED: 21460061; PMC: 3083093; DOI: 10.1101/gr.114272.110
-
Citrobacter rodentium is an unstable pathogen showing evidence of significant genomic flux.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
Citrobacter rodentium is a natural mouse pathogen that causes attaching and effacing (A/E) lesions. It shares a common virulence strategy with the clinically significant human A/E pathogens enteropathogenic E. coli (EPEC) and enterohaemorrhagic E. coli (EHEC) and is widely used to model this route of pathogenesis. We previously reported the complete genome sequence of C. rodentium ICC168, where we found that the genome displayed many characteristics of a newly evolved pathogen. In this study, through PFGE, sequencing of isolates showing variation, whole genome transcriptome analysis and examination of the mobile genetic elements, we found that, consistent with our previous hypothesis, the genome of C. rodentium is unstable as a result of repeat-mediated, large-scale genome recombination and because of active transposition of mobile genetic elements such as the prophages. We sequenced an additional C. rodentium strain, EX-33, to reveal that the reference strain ICC168 is representative of the species and that most of the inactivating mutations were common to both isolates and likely to have occurred early on in the evolution of this pathogen. We draw parallels with the evolution of other bacterial pathogens and conclude that C. rodentium is a recently evolved pathogen that may have emerged alongside the development of inbred mice as a model for human disease.
Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust
PLoS pathogens 2011;7;4;e1002018
PUBMED: 21490962; PMC: 3072379; DOI: 10.1371/journal.ppat.1002018
-
Characterization of the proteome, diseases and evolution of the human postsynaptic density.
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.
We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.
Funded by: Medical Research Council: G0802238(89569); Wellcome Trust: 066717, 077155
Nature neuroscience 2011;14;1;19-21
PUBMED: 21170055; PMC: 3040565; DOI: 10.1038/nn.2719
2010 Publications
-
Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.
Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.
Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.
Funded by: Wellcome Trust
Genes & development 2010;24;24;2772-7
PUBMED: 21159818; PMC: 3003195; DOI: 10.1101/gad.606110
-
A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk
A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.
Funded by: Wellcome Trust
Journal of bacteriology 2010;192;21;5746-54
PUBMED: 20817773; PMC: 2953684; DOI: 10.1128/JB.00659-10
-
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk
The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.
Funded by: Wellcome Trust
Cell stem cell 2010;6;4;382-95
PUBMED: 20362542; PMC: 2860244; DOI: 10.1016/j.stem.2010.03.004
-
Cross species proteomics.
Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK.
Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F004605/1
Methods in molecular biology (Clifton, N.J.) 2010;604;123-35
PUBMED: 20013368; DOI: 10.1007/978-1-60761-444-9_9
-
Scoring and validation of tandem MS peptide identification methods.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.
Methods in molecular biology (Clifton, N.J.) 2010;604;43-53
PUBMED: 20013363; DOI: 10.1007/978-1-60761-444-9_4
2009 Publications
-
Cell biology. Evolving cell signals.
Proteomic Mass Spectrometry Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. moc@sanger.ac.uk
Science (New York, N.Y.) 2009;325;5948;1635-6
PUBMED: 19779182; DOI: 10.1126/science.1180331
-
Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.
Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. tl2@sanger.ac.uk
Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.
Funded by: Wellcome Trust
Journal of bacteriology 2009;191;17;5377-86
PUBMED: 19542279; PMC: 2725610; DOI: 10.1128/JB.00597-09
-
A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.
Funded by: Wellcome Trust
PLoS genetics 2009;5;7;e1000569
PUBMED: 19609351; PMC: 2704369; DOI: 10.1371/journal.pgen.1000569
-
Accurate and sensitive peptide identification with Mascot Percolator.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.
Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.
Funded by: Wellcome Trust: 077198
Journal of proteome research 2009;8;6;3176-81
PUBMED: 19338334; PMC: 2734080; DOI: 10.1021/pr800982s
-
Recent developments in proteome informatics for mass spectrometry analysis.
Faculty of Life Sciences, University of Manchester, Manchester M139PT, UK.
Mass spectrometry has become the pre-eminent analytical method for the study of proteins and proteomes in post-genome science. The high volumes of complex spectra and data generated from such experiments represent new challenges for the field of bioinformatics. The past decade has seen an explosion of informatics tools targeted towards the processing, analysis, storage, and integration of mass spectrometry based proteomic data. In this review, some of the more recent developments in proteome informatics will be discussed. This includes new tools for predicting the properties of proteins and peptides which can be exploited in experimental proteomic design, and tools for the identification of peptides and proteins from their mass spectra. Similarly, informatics approaches are required for the move towards quantitative proteomics which are also briefly discussed. Finally, the growing number of proteomic data repositories and emerging data standards developed for the field are highlighted. These tools and technologies point the way towards the next phase of experimental proteomics and informatics challenges that the proteomics community will face.
Combinatorial chemistry & high throughput screening 2009;12;2;194-202
PUBMED: 19199887
-
Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.
Dept Veterinary Preclinical Sciences, University of Liverpool, Liverpool, UK. james.wright@manchester.ac.uk
Background: Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).
Results: 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.
Conclusion: This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D006996/1, CFB17723
BMC genomics 2009;10;61
PUBMED: 19193216; PMC: 2644712; DOI: 10.1186/1471-2164-10-61
-
Neurotransmitters drive combinatorial multistate postsynaptic density networks.
Genes to Cognition, Wellcome Trust Sanger Institute, Cambridgeshire, UK.
The mammalian postsynaptic density (PSD) comprises a complex collection of approximately 1100 proteins. Despite extensive knowledge of individual proteins, the overall organization of the PSD is poorly understood. Here, we define maps of molecular circuitry within the PSD based on phosphorylation of postsynaptic proteins. Activation of a single neurotransmitter receptor, the N-methyl-D-aspartate receptor (NMDAR), changed the phosphorylation status of 127 proteins. Stimulation of ionotropic and metabotropic glutamate receptors and dopamine receptors activated overlapping networks with distinct combinatorial phosphorylation signatures. Using peptide array technology, we identified specific phosphorylation motifs and switching mechanisms responsible for the integration of neurotransmitter receptor pathways and their coordination of multiple substrates in these networks. These combinatorial networks confer high information-processing capacity and functional diversity on synapses, and their elucidation may provide new insights into disease mechanisms and new opportunities for drug discovery.
Funded by: Wellcome Trust: 066717
Science signaling 2009;2;68;ra19
PUBMED: 19401593; PMC: 3280897; DOI: 10.1126/scisignal.2000102
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.
The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.
Funded by: Wellcome Trust
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
2008 Publications
-
Mapping multiprotein complexes by affinity purification and mass spectrometry.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
The combination of affinity purification and tandem mass spectrometry (MS) has emerged as a powerful approach to delineate biological processes. In particular, the use of epitope tags has allowed this approach to become scaleable and has bypassed difficulties associated with generation of antibodies. Single epitope tags and tandem affinity purification (TAP) tags have been used to systematically map protein complexes generating protein interaction data at a near proteome-wide scale. Recent developments in the design of tags, optimisation of purification conditions, experimental design and data analysis have greatly improved the sensitivity and specificity of this approach. Concomitant developments in MS, including high accuracy and high-throughput instrumentation together with quantitative MS methods, have facilitated large-scale and comprehensive analysis of multiprotein complexes.
Current opinion in biotechnology 2008;19;4;324-30
PUBMED: 18598764; DOI: 10.1016/j.copbio.2008.06.002
-
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.
Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.
Funded by: Medical Research Council; Wellcome Trust
Nature neuroscience 2008;11;7;799-806
PUBMED: 18536710; PMC: 3624047; DOI: 10.1038/nn.2135
-
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.
We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.
Funded by: Wellcome Trust
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
PUBMED: 18388127; DOI: 10.1074/mcp.M700564-MCP200
-
Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
It is a major challenge to develop effective sequence database search algorithms to translate molecular weight and fragment mass information obtained from tandem mass spectrometry into high quality peptide and protein assignments. We investigated the peptide identification performance of Mascot and X!Tandem for mass tolerance settings common for low and high accuracy mass spectrometry. We demonstrated that sensitivity and specificity of peptide identification can vary substantially for different mass tolerance settings, but this effect was more significant for Mascot. We present an adjusted Mascot threshold, which allows the user to freely select the best trade-off between sensitivity and specificity. The adjusted Mascot threshold was compared with the default Mascot and X!Tandem scoring thresholds and shown to be more sensitive at the same false discovery rates for both low and high accuracy mass spectrometry data.
Funded by: Wellcome Trust: 077198
Molecular & cellular proteomics : MCP 2008;7;5;962-70
PUBMED: 18216375; PMC: 2656932; DOI: 10.1074/mcp.M700293-MCP200
-
Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk
Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.
Funded by: Wellcome Trust
Journal of bacteriology 2008;190;7;2580-7
PUBMED: 18192390; PMC: 2293211; DOI: 10.1128/JB.01654-07
Team
Team members
- Jyoti Choudhary
- Head of Mass Spectrometry
- Mark Collins
- moc@sanger.ac.ukSenior Staff Scientist
- Mercedes Pardo Calvo
- Senior Staff Scientist
- James Wright
- jw13@sanger.ac.ukSenior Bioinformatician
- Lu Yu
- Senior Staff Scientist
Jyoti Choudhary
- Head of Mass Spectrometry
She received her Ph.D. from the Imperial College, London, in the Biological Mass Spectrometry group of Prof. Howard Morris. She continued her research as a post-doctoral fellow by developing methods to purify and characterise membrane protein complexes by mass spectrometry. In 1997 she joined the Bioanalytical Sciences division in GlaxoWellcome and was then recruited to the CellMap project, which was founded to pursue the development of proteomics technologies and investigate their value in drug discovery. This unit was spun out of GlaxoSmithKline, and she became a founding member of Cellzome AG, in the UK.
Research
Dr. Choudhary’s research group at the Sanger Institute, Cambridge UK, is focused on developing and applying biochemical and analytical methods for proteomics applications.
Mark Collins
moc@sanger.ac.uk Senior Staff Scientist
I graduated with a Joint Honours degree in Biochemistry and Molecular Genetics from University College Dublin in 2000, during which I gained laboratory experience at the Johns Hopkins University School of Medicine. I spent a year working at the Centre for Liver Disease at the Mater Misericordiae hospital in Dublin before pursuing a PhD in Molecular Neuroscience at the University of Edinburgh under the supervision of Prof. Seth Grant. During my PhD I exploited and developed emerging biochemical approaches to characterise the mammalian postsynaptic proteome in terms of its components, post-translational modifications and organisation into multi-protein complexes.
Research
Since joining the Proteomic Mass Spectrometry group in 2005, I have combined my expertise in biochemistry with state of the art mass spectrometry to tackle a range of biological problems. My research interests encompass comprehensive proteome interrogation and quantification, post-translational modifications and protein complexes. I am particularly interested in developing and applying novel methods to enrich for modified proteins/peptides (phosphorylation, palmitoylation) and large-scale quantitative analysis of PTM’s in perturbation experiments. In addition, I have a long-standing interest in dissecting post-synaptic protein complexes using combinations of peptide-affinity and tandem affinity purification with stable isotope labelling strategies for differential quantification.
References
-
Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.
Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.
Funded by: Wellcome Trust: 079643/Z/06/Z, 089084
Cell host & microbe 2012;12;2;246-58
PUBMED: 22901544; PMC: 3501726; DOI: 10.1016/j.chom.2012.06.005
-
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.
Funded by: Wellcome Trust: 079643/Z/06/Z
Molecular & cellular proteomics : MCP 2012;11;8;478-91
PUBMED: 22493177; PMC: 3412976; DOI: 10.1074/mcp.O111.014522
-
A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
Calcium-dependent protein kinases (CDPKs) play key regulatory roles in the life cycle of the malaria parasite, but in many cases their precise molecular functions are unknown. Using the rodent malaria parasite Plasmodium berghei, we show that CDPK1, which is known to be essential in the asexual blood stage of the parasite, is expressed in all life stages and is indispensable during the sexual mosquito life-cycle stages. Knockdown of CDPK1 in sexual stages resulted in developmentally arrested parasites and prevented mosquito transmission, and these effects were independent of the previously proposed function for CDPK1 in regulating parasite motility. In-depth translational and transcriptional profiling of arrested parasites revealed that CDPK1 translationally activates mRNA species in the developing zygote that in macrogametes remain repressed via their 3' and 5'UTRs. These findings indicate that CDPK1 is a multifunctional protein that translationally regulates mRNAs to ensure timely and stage-specific protein expression.
Funded by: Medical Research Council: G0501670; Wellcome Trust: 079643/Z/06/Z, WT098051
Cell host & microbe 2012;12;1;9-19
PUBMED: 22817984; PMC: 3414820; DOI: 10.1016/j.chom.2012.05.014
-
APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.
The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.
Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.
Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z
Nature cell biology 2011;13;10;1234-43
PUBMED: 21926987; PMC: 3188299; DOI: 10.1038/ncb2347
-
Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.
The Gurdon Institute, University of Cambridge, Cambridge, UK.
Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.
Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z
Molecular cell 2011;43;3;406-17
PUBMED: 21816347; PMC: 3332305; DOI: 10.1016/j.molcel.2011.05.031
-
Characterization of the proteome, diseases and evolution of the human postsynaptic density.
Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.
We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.
Funded by: Medical Research Council: G0802238(89569); Wellcome Trust: 066717, 077155
Nature neuroscience 2011;14;1;19-21
PUBMED: 21170055; PMC: 3040565; DOI: 10.1038/nn.2719
-
Cell biology. Evolving cell signals.
Proteomic Mass Spectrometry Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. moc@sanger.ac.uk
Science (New York, N.Y.) 2009;325;5948;1635-6
PUBMED: 19779182; DOI: 10.1126/science.1180331
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.
The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.
Funded by: Wellcome Trust
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
-
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.
Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.
Funded by: Medical Research Council; Wellcome Trust
Nature neuroscience 2008;11;7;799-806
PUBMED: 18536710; PMC: 3624047; DOI: 10.1038/nn.2135
-
Proteomic analysis of in vivo phosphorylated synaptic proteins.
Division of Neuroscience, University of Edinburgh, Edinburgh EH8 9JZ, UK.
In the nervous system, protein phosphorylation is an essential feature of synaptic function. Although protein phosphorylation is known to be important for many synaptic processes and in disease, little is known about global phosphorylation of synaptic proteins. Heterogeneity and low abundance make protein phosphorylation analysis difficult, particularly for mammalian tissue samples. Using a new approach, combining both protein and peptide immobilized metal affinity chromatography and mass spectrometry data acquisition strategies, we have produced the first large scale map of the mouse synapse phosphoproteome. We report over 650 phosphorylation events corresponding to 331 sites (289 have been unambiguously assigned), 92% of which are novel. These represent 79 proteins, half of which are novel phosphoproteins, and include several highly phosphorylated proteins such as MAP1B (33 sites) and Bassoon (30 sites). An additional 149 candidate phosphoproteins were identified by profiling the composition of the protein immobilized metal affinity chromatography enrichment. All major synaptic protein classes were observed, including components of important pre- and postsynaptic complexes as well as low abundance signaling proteins. Bioinformatic and in vitro phosphorylation assays of peptide arrays suggest that a small number of kinases phosphorylate many proteins and that each substrate is phosphorylated by many kinases. These data substantially increase existing knowledge of synapse protein phosphorylation and support a model where the synapse phosphoproteome is functionally organized into a highly interconnected signaling network.
The Journal of biological chemistry 2005;280;7;5972-82
PUBMED: 15572359; DOI: 10.1074/jbc.M411220200
Mercedes Pardo Calvo
- Senior Staff Scientist
I graduated with Honours in Pharmacy and then completed a PhD in Microbiology at Universidad Complutense de Madrid in 2000 under the supervision of Drs Gil and Nombela, also spending four months at McGill University. My PhD research explored yeast cell wall biogenesis using proteomics, genetics and cell biology. I then did postdoctoral research in CRUK London Research Institute under the supervision of Sir Paul Nurse, using fission yeast as model system. I combined genetics and cell biology to characterize the role of the microtubule cytoskeleton during cytokinesis and identify novel regulators of its organization and dynamics.
Research
I joined the Proteomic Mass Spectrometry group in 2004 setting out to characterize protein interactions using affinity purification and mass spectrometry. In collaboration with the Skarnes and Bradley labs I developed the endogenous TAP (tandem affinity purification) technology in mouse embryonic stem cells, applying it to study chromatin-associated proteins regulating stem cell biology. I have recently shifted my interest to enzymes that introduce less well-known protein modifications. Other areas of interest include lncRNAs-protein interactions. I am also involved in the Wellcome Trust Advanced Courses, teaching TAP in the Genome-wide Approaches with Fission Yeast and Protein Interactions and Networks courses.
References
-
Mechanisms controlling the temporal degradation of Nek2A and Kif18A by the APC/C-Cdc20 complex.
The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
The Anaphase Promoting Complex/Cyclosome (APC/C) in complex with its co-activator Cdc20 is responsible for targeting proteins for ubiquitin-mediated degradation during mitosis. The activity of APC/C-Cdc20 is inhibited during prometaphase by the Spindle Assembly Checkpoint (SAC) yet certain substrates escape this inhibition. Nek2A degradation during prometaphase depends on direct binding of Nek2A to the APC/C via a C-terminal MR dipeptide but whether this motif alone is sufficient is not clear. Here, we identify Kif18A as a novel APC/C-Cdc20 substrate and show that Kif18A degradation depends on a C-terminal LR motif. However in contrast to Nek2A, Kif18A is not degraded until anaphase showing that additional mechanisms contribute to Nek2A degradation. We find that dimerization via the leucine zipper, in combination with the MR motif, is required for stable Nek2A binding to and ubiquitination by the APC/C. Nek2A and the mitotic checkpoint complex (MCC) have an overlap in APC/C subunit requirements for binding and we propose that Nek2A binds with high affinity to apo-APC/C and is degraded by the pool of Cdc20 that avoids inhibition by the SAC.
Funded by: Wellcome Trust: 079643/Z/06/Z
The EMBO journal 2013;32;2;303-14
PUBMED: 23288039; PMC: 3553385; DOI: 10.1038/emboj.2012.335
-
Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.
Funded by: Cancer Research UK; Medical Research Council: G0600127; Wellcome Trust
The EMBO journal 2012;31;11;2486-97
PUBMED: 22510880; PMC: 3365428; DOI: 10.1038/emboj.2012.91
-
Assignment of protein interactions from affinity purification/mass spectrometry data.
Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. mp3@sanger.ac.uk
The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.
Funded by: Wellcome Trust: 079643/Z/06/Z
Journal of proteome research 2012;11;3;1462-74
PUBMED: 22283744; DOI: 10.1021/pr2011632
-
Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.
Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.
Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.
Funded by: Wellcome Trust
Genes & development 2010;24;24;2772-7
PUBMED: 21159818; PMC: 3003195; DOI: 10.1101/gad.606110
-
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk
The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.
Funded by: Wellcome Trust
Cell stem cell 2010;6;4;382-95
PUBMED: 20362542; PMC: 2860244; DOI: 10.1016/j.stem.2010.03.004
-
Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk
Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.
Funded by: Wellcome Trust
Journal of bacteriology 2008;190;7;2580-7
PUBMED: 18192390; PMC: 2293211; DOI: 10.1128/JB.01654-07
-
Genetic and proteomic evidences support the localization of yeast enolase in the cell surface.
Departamento de Microbiología II, Facultad de Farmacia, UCM, Madrid, Spain.
Although enolase, other glycolytic enzymes, and a variety of cytoplasmic proteins lacking an N-terminal secretion signal have been widely described as located at the cell surface in yeast and in mammalian cells, their presence in this external location is still controversial. Here, we report that different experimental approaches (genetics, cellular biology and proteomics) show that yeast enolase can reach the cell surface and describe the protein regions involved in its cell surface targeting. Hybrid enolase truncates, fused at their C terminus with the yeast internal invertase or green fluorescent protein (GFP) as reporter proteins, proved that the 169 N-terminal amino acids are sufficient to target the protein to the cell surface. Furthermore, the enolase-GFP fusion co-localized with a plasma membrane marker. Enolase was also identified among membrane proteins obtained by a purification protocol that includes sodium carbonate to prevent cytoplasmic contamination. These proteins were analyzed by SDS-PAGE, trypsin digestion and LC-MS/MS for peptide identification. Elongation factors, mitochondrial membrane proteins and a mannosyltransferase involved in cell wall mannan biosynthesis were also identified in this fraction.
Proteomics 2006;6 Suppl 1;S107-18
PUBMED: 16544286; DOI: 10.1002/pmic.200500479
-
The nuclear rim protein Amo1 is required for proper microtubule cytoskeleton organisation in fission yeast.
Cell Cycle Laboratory, Cancer Research UK, 44 Lincoln's Inn Fields, London, WC2A 3PX, UK. mp3@sanger.ac.uk
Microtubules have a central role in cell division and cell polarity in eukaryotic cells. The fission yeast is a useful organism for studying microtubule regulation owing to the highly organised nature of its microtubular arrays. To better understand microtubule dynamics and organisation we carried out a screen that identified over 30 genes whose overexpression resulted in microtubule cytoskeleton abnormalities. Here we describe a novel nucleoporin-like protein, Amo1, identified in this screen. Amo1 localises to the nuclear rim in a punctate pattern that does not overlap with nuclear pore complex components. Amo1Delta cells are bent, and they have fewer microtubule bundles that curl around the cell ends. The microtubules in amo1Delta cells have longer dwelling times at the cell tips, and grow in an uncoordinated fashion. Lack of Amo1 also causes a polarity defect. Amo1 is not required for the microtubule loading of several factors affecting microtubule dynamics, and does not seem to be required for nuclear pore function.
Journal of cell science 2005;118;Pt 8;1705-14
PUBMED: 15797925; DOI: 10.1242/jcs.02305
-
PST1 and ECM33 encode two yeast cell surface GPI proteins important for cell wall integrity.
Departamento de Microbiología II, Facultad de Farmacia, Universidad Complutense, Pza. Ramón y Cajal s/n, 28040 Madrid, Spain.
Pst1p was previously identified as a protein secreted by yeast regenerating protoplasts, which suggests a role in cell wall construction. ECM33 encodes a protein homologous to Pst1p, and both of them display typical features of GPI-anchored proteins and a characteristic receptor L-domain. Pst1p and Ecm33p are both localized to the cell surface, Pst1p being at the cell membrane and possibly also in the periplasmic space. Here, the characterization of pst1Delta, ecm33Delta and pst1Delta ecm33Delta mutants is described. Deletion of ECM33 leads to a weakened cell wall, and this defect is further aggravated by simultaneous deletion of PST1. As a result, the ecm33Delta mutant displays increased levels of activated Slt2p, the MAP kinase of the cell integrity pathway, and relies on a functional Slt2-mediated cell integrity pathway to ensure viability. Analyses of model glycosylated proteins show glycosylation defects in the ecm33Delta mutant. Ecm33p is also important for proper cell wall ultrastructure organization and, furthermore, for the correct assembly of the mannoprotein outer layer of the cell wall. Pst1p seems to act in the compensatory mechanism activated upon cell wall damage and, in these conditions, may partially substitute for Ecm33p.
Microbiology (Reading, England) 2004;150;Pt 12;4157-70
PUBMED: 15583168; DOI: 10.1099/mic.0.26924-0
-
Equatorial retention of the contractile actin ring by microtubules during cytokinesis.
Cell Cycle Laboratory, Cancer Research UK London Research Institute, 44 Lincoln's Inn Fields, London WC2A 3PX, UK. mercedes.pardo@cancer.org.uk
In most eukaryotes cytokinesis is brought about by a contractile actin ring located at the division plane. Here, in fission yeast the actin ring was found to be required to generate late-mitotic microtubular structures located at the division plane, and these in turn maintained the medial position of the actin ring. When these microtubular structures were disrupted, the actin ring migrated away from the cell middle in a membrane traffic-dependent manner, resulting in asymmetrical cell divisions that led to genomic instability. We propose that these microtubular structures contribute to a checkpoint control that retains the equatorial position of the ring when progression through cytokinesis is delayed.
Science (New York, N.Y.) 2003;300;5625;1569-74
PUBMED: 12791993; DOI: 10.1126/science.1084671
James Wright
jw13@sanger.ac.uk Senior Bioinformatician
In 2000 I studied a degree in Biological and Computational Science at UMIST, including a one year placement with EST Informatics at AstraZeneca, focussing on the exploitation of microarray data. My dissertation used machine learning methods to classify genomic sequences. I then studied a master’s in Physical Methods for Bioanalysis and Post Genomic Science, investigating using domains to classify phosphatases. In 2005 I began a PhD tackling cross species proteomics using lab based and in-silico strategies with Rob Beynon at Liverpool University and Simon Hubbard at the University of Manchester. In 2009 I joined the Sanger institute.
Research
My research interests include most aspects of proteomic bioinformatics and data analysis. I am currently working on projects involving unexpected PTM detection, validation, and localisation (ModX, Turbo-SloMo), machine learning methods to improve protein identification (Mascot Percolator), label free protein quantification and GO term enrichment, and proteogenomic genome annotation and characterisation. I also provide proteomic informatics support to a wide range of internal and external proteomics projects.
References
-
The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.
Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.
Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA
Molecular & cellular proteomics : MCP 2012;11;12;1682-9
PUBMED: 22949509; PMC: 3518121; DOI: 10.1074/mcp.O112.021543
-
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.
Funded by: Wellcome Trust: 079643/Z/06/Z
Molecular & cellular proteomics : MCP 2012;11;8;478-91
PUBMED: 22493177; PMC: 3412976; DOI: 10.1074/mcp.O111.014522
-
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).
Funded by: Cancer Research UK; Wellcome Trust: 077198
Genome research 2011;21;5;756-67
PUBMED: 21460061; PMC: 3083093; DOI: 10.1101/gr.114272.110
-
Cross species proteomics.
Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK.
Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.
Funded by: Biotechnology and Biological Sciences Research Council: BB/F004605/1
Methods in molecular biology (Clifton, N.J.) 2010;604;123-35
PUBMED: 20013368; DOI: 10.1007/978-1-60761-444-9_9
-
Recent developments in proteome informatics for mass spectrometry analysis.
Faculty of Life Sciences, University of Manchester, Manchester M139PT, UK.
Mass spectrometry has become the pre-eminent analytical method for the study of proteins and proteomes in post-genome science. The high volumes of complex spectra and data generated from such experiments represent new challenges for the field of bioinformatics. The past decade has seen an explosion of informatics tools targeted towards the processing, analysis, storage, and integration of mass spectrometry based proteomic data. In this review, some of the more recent developments in proteome informatics will be discussed. This includes new tools for predicting the properties of proteins and peptides which can be exploited in experimental proteomic design, and tools for the identification of peptides and proteins from their mass spectra. Similarly, informatics approaches are required for the move towards quantitative proteomics which are also briefly discussed. Finally, the growing number of proteomic data repositories and emerging data standards developed for the field are highlighted. These tools and technologies point the way towards the next phase of experimental proteomics and informatics challenges that the proteomics community will face.
Combinatorial chemistry & high throughput screening 2009;12;2;194-202
PUBMED: 19199887
-
Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.
Dept Veterinary Preclinical Sciences, University of Liverpool, Liverpool, UK. james.wright@manchester.ac.uk
Background: Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).
Results: 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.
Conclusion: This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
Funded by: Biotechnology and Biological Sciences Research Council: BB/D006996/1, CFB17723
BMC genomics 2009;10;61
PUBMED: 19193216; PMC: 2644712; DOI: 10.1186/1471-2164-10-61
Lu Yu
- Senior Staff Scientist
After obtained a BSc and MSc in Fudan University, I worked with organic mass spectrometry at Shanghai Institute of Organic Chemistry. My PhD study was supervised by Professor Simon Gaskell at UMIST Manchester on protein epitope mapping by mass spectrometry. I joined the Cell Map Project at GSK in 1999 then Cellzome UK in 2001. During this period, I gained experience in high-throughput nano-scale LC-MS/MS analysis on protein complexes in deciphering the APP processing pathway of Alzheimer’s disease. I also implemented 2DLC-MS/MS for proteome profiling of human cellular extracts, and optimized nano-scale LC-MS/MS strategy for phosphoproteomics.
Research
I joined the Proteomic Mass Spectrometry team in early 2004. I have applied my broad expertise in the analysis of biomolecules, particularly in sample preparation, development and application of multidimensional HPLC coupled with mass spectrometry towards the characterisation and quantification of proteins and PTMs. Projects include protein complexes from mammalian cells, genome annotation, protein identification and quantification (using chemical derivatisation or label-free) of bacteria studies, and de novo peptide sequencing. I also support bioinformatics development in the team for efficient data mining and result generation and other projects, and manage and maintain mass spectrometers and allied instruments in the lab.
References
-
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.
Funded by: Wellcome Trust: 079643/Z/06/Z
Molecular & cellular proteomics : MCP 2012;11;8;478-91
PUBMED: 22493177; PMC: 3412976; DOI: 10.1074/mcp.O111.014522
-
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.
Funded by: Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z
Microbiology (Reading, England) 2011;157;Pt 10;2922-32
PUBMED: 21816880; PMC: 3353397; DOI: 10.1099/mic.0.050278-0
-
A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk
A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.
Funded by: Wellcome Trust
Journal of bacteriology 2010;192;21;5746-54
PUBMED: 20817773; PMC: 2953684; DOI: 10.1128/JB.00659-10
-
An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.
Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk
The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.
Funded by: Wellcome Trust
Cell stem cell 2010;6;4;382-95
PUBMED: 20362542; PMC: 2860244; DOI: 10.1016/j.stem.2010.03.004
-
Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.
Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. tl2@sanger.ac.uk
Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.
Funded by: Wellcome Trust
Journal of bacteriology 2009;191;17;5377-86
PUBMED: 19542279; PMC: 2725610; DOI: 10.1128/JB.00597-09
-
A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.
Funded by: Wellcome Trust
PLoS genetics 2009;5;7;e1000569
PUBMED: 19609351; PMC: 2704369; DOI: 10.1371/journal.pgen.1000569
-
Accurate and sensitive peptide identification with Mascot Percolator.
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.
Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.
Funded by: Wellcome Trust: 077198
Journal of proteome research 2009;8;6;3176-81
PUBMED: 19338334; PMC: 2734080; DOI: 10.1021/pr800982s
-
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.
We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.
Funded by: Wellcome Trust
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
PUBMED: 18388127; DOI: 10.1074/mcp.M700564-MCP200
-
Proteomic analysis of in vivo phosphorylated synaptic proteins.
Division of Neuroscience, University of Edinburgh, Edinburgh EH8 9JZ, UK.
In the nervous system, protein phosphorylation is an essential feature of synaptic function. Although protein phosphorylation is known to be important for many synaptic processes and in disease, little is known about global phosphorylation of synaptic proteins. Heterogeneity and low abundance make protein phosphorylation analysis difficult, particularly for mammalian tissue samples. Using a new approach, combining both protein and peptide immobilized metal affinity chromatography and mass spectrometry data acquisition strategies, we have produced the first large scale map of the mouse synapse phosphoproteome. We report over 650 phosphorylation events corresponding to 331 sites (289 have been unambiguously assigned), 92% of which are novel. These represent 79 proteins, half of which are novel phosphoproteins, and include several highly phosphorylated proteins such as MAP1B (33 sites) and Bassoon (30 sites). An additional 149 candidate phosphoproteins were identified by profiling the composition of the protein immobilized metal affinity chromatography enrichment. All major synaptic protein classes were observed, including components of important pre- and postsynaptic complexes as well as low abundance signaling proteins. Bioinformatic and in vitro phosphorylation assays of peptide arrays suggest that a small number of kinases phosphorylate many proteins and that each substrate is phosphorylated by many kinases. These data substantially increase existing knowledge of synapse protein phosphorylation and support a model where the synapse phosphoproteome is functionally organized into a highly interconnected signaling network.
The Journal of biological chemistry 2005;280;7;5972-82
PUBMED: 15572359; DOI: 10.1074/jbc.M411220200
-
The three-dimensional structure and X-ray sequence reveal that trichomaglin is a novel S-like ribonuclease.
State Key Laboratory of Bio-organic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China.
Trichomaglin is a protein isolated from root tuber of the plant Maganlin (Trichosanthes Lepiniate, Cucurbitaceae). The crystal structure of trichomaglin has been determined by multiple-isomorphous replacement and refined at 2.2 A resolution. The X-ray sequence was established, based on electron density combined with the experimentally determined N-terminal sequence, and the sequence information derived from mass spectroscopic analysis. X-ray sequence-based homolog search and the three-dimensional structure reveal that trichomaglin is a novel S-like RNase, which was confirmed by biological assay. Trichomaglin molecule contains an additional beta sheet in the HV(b) region, compared with the known plant RNase structures. Fourteen cystein residues form seven disulfide bridges, more than those in the other known structures of S- and S-like RNases. His43 and His105 are expected to be the catalytic acid and base, respectively. Four hydrosulfate ions are bound in the active site pocket, three of them mimicking the substrate binding sites.
Structure (London, England : 1993) 2004;12;6;1015-25
PUBMED: 15274921; DOI: 10.1016/j.str.2004.03.023







