Proteomic Mass Spectrometry

We are interested in solving questions about the biology underlying health and disease by studying molecular processes and pathways. To do this we use a range of techniques including mass spectrometry, biochemistry, molecular biology and informatics, to study protein function, proteome composition and organisation.

The Proteomic mass spectrometry team compliments the research programmes of the Sanger Institute to further understanding of the relationship between the genome and the proteome (expressed protein complement). Through our research and collaborations we explore protein-protein interactions, cell signalling and protein expression in a range of human, pathogen and model organisms.

[ Genome Research Limited]

Research

Our research portfolio encompasses all levels of proteome complexity: from whole organism characterisation to subcellular organelles, from protein complexes to post-translation modifications.

Protein Interactions

We employ affinity purification (epitope tagging) and tandem mass spectrometry to characterise protein complexes and map protein interaction networks and their dynamics.

Cyclin Interactome Dynamics

Cyclin Interactome Dynamics

zoom

Selected Publications:

  • Assignment of protein interactions from affinity purification/mass spectrometry data.

    Pardo M and Choudhary JS

    Journal of proteome research 2012;11;3;1462-74

  • Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.

    Pagliuca FW, Collins MO, Lichawska A, Zegerman P, Choudhary JS and Pines J

    Molecular cell 2011;43;3;406-17

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Cell stem cell 2010;6;4;382-95

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Molecular systems biology 2009;5;269

Post Translational Modifications

There are several levels at which we study protein modifications; the proteome level, the protein level and the modification level. We develop bioinformatics methods and analytical strategies for the identification of all detectable modifications. We also use enrichment techniques to target specific modifications such as phosphorylation for detailed analysis.

Protein Centric Modification Detection

Protein Centric Modification Detection

zoom

Selected Publications:

  • Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.

    Jones ML, Collins MO, Goulding D, Choudhary JS and Rayner JC

    Cell host & microbe 2012;12;2;246-58

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Science signaling 2009;2;68;ra19

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Proteomic analysis of in vivo phosphorylated synaptic proteins.

    Collins MO, Yu L, Coba MP, Husi H, Campuzano I, Blackstock WP, Choudhary JS and Grant SG

    The Journal of biological chemistry 2005;280;7;5972-82

Proteogenomics

Proteogenomics uses mass spectrometry data to experimentally validate gene products and to assist in the process of genome annotation and comparison. We develop tools and methods to facilitate use of proteomics data for this application.

Gene Discovery and Characterisation

Gene Discovery and Characterisation

zoom

Selected Publications:

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    Genome research 2011;21;5;756-67

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    Journal of proteome research 2009;8;6;3176-81

  • Interrogating the human genome using uninterpreted mass spectrometry data.

    Choudhary JS, Blackstock WP, Creasy DM and Cottrell JS

    Proteomics 2001;1;5;651-67

Proteome Characterisation and Quantification

We are also constantly developing novel mass spectrometry and informatics techniques to improve protein identification and quantification. These include profiling the changes in protein expression in diseased organisms, analysis of protein localisation to subcellular organelles, studies examining protein synthesis and turn-over and absolute quantification of protein species.

Protein Synthesis Quantification

Protein Synthesis Quantification

zoom

Selected Publications:

  • A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.

    Sebastian S, Brochet M, Collins MO, Schwach F, Jones ML, Goulding D, Rayner JC, Choudhary JS and Billker O

    Cell host & microbe 2012;12;1;9-19

  • Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.

    Chaudhuri RR, Yu L, Kanji A, Perkins TT, Gardner PP, Choudhary J, Maskell DJ and Grant AJ

    Microbiology (Reading, England) 2011;157;Pt 10;2922-32

  • Characterization of the proteome, diseases and evolution of the human postsynaptic density.

    Bayés A, van de Lagemaat LN, Collins MO, Croning MD, Whittle IR, Choudhary JS and Grant SG

    Nature neuroscience 2011;14;1;19-21

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Nature neuroscience 2008;11;7;799-806

Technology

copyright GRL

[ Genome Research Limited ]

Technology and Instrumentation

MS Workflows

MS Workflows

zoom

We have a long-standing expertise in sample preparation, peptide and protein separation and purification technologies, mass spectrometry and proteomics data analysis. Our well-equipped laboratory has a range of state-of-the-art high-resolution mass spectrometers that we combine with innovative tools and software to precisely identify and quantify proteins and their modifications in the proteome.

Informatics

  • Mascot Percolator - Allows accurate and sensitive peptide identification from low- and high-accuracy mass spectrometry data.
  • SloMo - We have adapted the original SLoMo tool for fast high throughput modification site localisation.
  • ModX - A toolbox for the detection and validation of protein modifications.

Methods

eTAP-MS (endogenous tandem affinity purification – mass spectrometry)

In conjunction with the mouse research teams at the Sanger Institute, we have developed a technology that enables protein interactions in cell lines and tissues to be mapped both at the large-scale systematic level and to individual genes of interest. The approach uses two affinity tags to isolate protein assemblies associated with a specific gene. By identifying and analysing these protein assemblies, the genes biological function in cellular processes or signalling pathways is revealed.

Selected Method and Technology Publications:

  • Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

    Wright JC, Collins MO, Yu L, Käll L, Brosch M and Choudhary JS

    Molecular & cellular proteomics : MCP 2012;11;8;478-91

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Cell stem cell 2010;6;4;382-95

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Molecular systems biology 2009;5;269

Collaborations

Collaborating with Proteomic Mass Spectrometry Group

We work closely with research teams across the Sanger Institute research programmes and with many organisations around the world.

If you are interested in collaborating with us please contact Jyoti Choudhary.

External Collaborations

Please see related projects section for internal Sanger Institute collaborations.

Opportunities

For career opportunities with our group please visit the Sanger careers pages.

We also welcome applications from self funded postdocs.

Software

Mascot percolator

Mascot Percolator allows accurate and sensitive peptide identification from low- and high-accuracy mass spectrometry data. It combines the database search algorithm Mascot with the machine-learning algorithm Percolator to accurately score results.

Turbo SloMo

We have adapted the original SLoMo tool for fast high throughput modification site localisation.

ModX

ModX is a set of Perl scripts and libraries to automatically process the output of multiple PTM detection algorithms and validate detections using Mascot Percolator.

  • - Currently in development -

Datasets

Downloadable Proteomics Datasets

EBI - PRIDE

PeptideAtlas FTP

Training

Wellcome Trust Advanced Courses

References

2013 Publications

  • Mechanisms controlling the temporal degradation of Nek2A and Kif18A by the APC/C-Cdc20 complex.

    Sedgwick GG, Hayward DG, Di Fiore B, Pardo M, Yu L, Pines J and Nilsson J

    The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

    The Anaphase Promoting Complex/Cyclosome (APC/C) in complex with its co-activator Cdc20 is responsible for targeting proteins for ubiquitin-mediated degradation during mitosis. The activity of APC/C-Cdc20 is inhibited during prometaphase by the Spindle Assembly Checkpoint (SAC) yet certain substrates escape this inhibition. Nek2A degradation during prometaphase depends on direct binding of Nek2A to the APC/C via a C-terminal MR dipeptide but whether this motif alone is sufficient is not clear. Here, we identify Kif18A as a novel APC/C-Cdc20 substrate and show that Kif18A degradation depends on a C-terminal LR motif. However in contrast to Nek2A, Kif18A is not degraded until anaphase showing that additional mechanisms contribute to Nek2A degradation. We find that dimerization via the leucine zipper, in combination with the MR motif, is required for stable Nek2A binding to and ubiquitination by the APC/C. Nek2A and the mitotic checkpoint complex (MCC) have an overlap in APC/C subunit requirements for binding and we propose that Nek2A binds with high affinity to apo-APC/C and is degraded by the pool of Cdc20 that avoids inhibition by the SAC.

    Funded by: Cancer Research UK: 13678; Wellcome Trust: 079643/Z/06/Z, 092096

    The EMBO journal 2013;32;2;303-14

2012 Publications

  • The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.

    Côté RG, Griss J, Dianes JA, Wang R, Wright JC, van den Toorn HW, van Breukelen B, Heck AJ, Hulstaert N, Martens L, Reisinger F, Csordas A, Ovelleiro D, Perez-Rivevol Y, Barsnes H, Hermjakob H and Vizcaíno JA

    Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA

    Molecular & cellular proteomics : MCP 2012;11;12;1682-9

  • Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.

    Jones ML, Collins MO, Goulding D, Choudhary JS and Rayner JC

    Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.

    Funded by: Wellcome Trust: 079643/Z/06/Z, 089084

    Cell host & microbe 2012;12;2;246-58

  • Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

    Wright JC, Collins MO, Yu L, Käll L, Brosch M and Choudhary JS

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

    Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Molecular & cellular proteomics : MCP 2012;11;8;478-91

  • A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.

    Sebastian S, Brochet M, Collins MO, Schwach F, Jones ML, Goulding D, Rayner JC, Choudhary JS and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Calcium-dependent protein kinases (CDPKs) play key regulatory roles in the life cycle of the malaria parasite, but in many cases their precise molecular functions are unknown. Using the rodent malaria parasite Plasmodium berghei, we show that CDPK1, which is known to be essential in the asexual blood stage of the parasite, is expressed in all life stages and is indispensable during the sexual mosquito life-cycle stages. Knockdown of CDPK1 in sexual stages resulted in developmentally arrested parasites and prevented mosquito transmission, and these effects were independent of the previously proposed function for CDPK1 in regulating parasite motility. In-depth translational and transcriptional profiling of arrested parasites revealed that CDPK1 translationally activates mRNA species in the developing zygote that in macrogametes remain repressed via their 3' and 5'UTRs. These findings indicate that CDPK1 is a multifunctional protein that translationally regulates mRNAs to ensure timely and stage-specific protein expression.

    Funded by: Medical Research Council: G0501670; Wellcome Trust: 079643/Z/06/Z, WT098051

    Cell host & microbe 2012;12;1;9-19

  • Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.

    Wilson CH, Crombie C, van der Weyden L, Poulogiannis G, Rust AG, Pardo M, Gracia T, Yu L, Choudhary J, Poulin GB, McIntyre RE, Winton DJ, March HN, Arends MJ, Fraser AG and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.

    Funded by: Cancer Research UK: 13031; Medical Research Council: G0600127; Wellcome Trust

    The EMBO journal 2012;31;11;2486-97

  • Assignment of protein interactions from affinity purification/mass spectrometry data.

    Pardo M and Choudhary JS

    Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. mp3@sanger.ac.uk

    The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Journal of proteome research 2012;11;3;1462-74

  • De novo CNV analysis implicates specific abnormalities of postsynaptic signalling complexes in the pathogenesis of schizophrenia.

    Kirov G, Pocklington AJ, Holmans P, Ivanov D, Ikeda M, Ruderfer D, Moran J, Chambert K, Toncheva D, Georgieva L, Grozeva D, Fjodorova M, Wollerton R, Rees E, Nikolov I, van de Lagemaat LN, Bayés A, Fernandez E, Olason PI, Böttcher Y, Komiyama NH, Collins MO, Choudhary J, Stefansson K, Stefansson H, Grant SG, Purcell S, Sklar P, O'Donovan MC and Owen MJ

    Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK. kirov@cardiff.ac.uk

    A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.

    Funded by: Medical Research Council: G0800509; NIMH NIH HHS: MH066392-05A1

    Molecular psychiatry 2012;17;2;142-53

  • Comparative study of human and mouse postsynaptic proteomes finds high compositional conservation and abundance differences for key synaptic proteins.

    Bayés A, Collins MO, Croning MD, van de Lagemaat LN, Choudhary JS and Grant SG

    Molecular Physiology of the Synapse Laboratory, Institut de Recerca de l'Hospital de la Santa Creu i Sant Pau, UAB, Barcelona, Catalonia, Spain. ABayesP@santpau.cat

    Direct comparison of protein components from human and mouse excitatory synapses is important for determining the suitability of mice as models of human brain disease and to understand the evolution of the mammalian brain. The postsynaptic density is a highly complex set of proteins organized into molecular networks that play a central role in behavior and disease. We report the first direct comparison of the proteome of triplicate isolates of mouse and human cortical postsynaptic densities. The mouse postsynaptic density comprised 1556 proteins and the human one 1461. A large compositional overlap was observed; more than 70% of human postsynaptic density proteins were also observed in the mouse postsynaptic density. Quantitative analysis of postsynaptic density components in both species indicates a broadly similar profile of abundance but also shows that there is higher abundance variation between species than within species. Well known components of this synaptic structure are generally more abundant in the mouse postsynaptic density. Significant inter-species abundance differences exist in some families of key postsynaptic density proteins including glutamatergic neurotransmitter receptors and adaptor proteins. Furthermore, we have identified a closely interacting set of molecules enriched in the human postsynaptic density that could be involved in dendrite and spine structural plasticity. Understanding synapse proteome diversity within and between species will be important to further our understanding of brain complexity and disease.

    Funded by: Medical Research Council: G0802238; Wellcome Trust

    PloS one 2012;7;10;e46683

  • SynGAP isoforms exert opposing effects on synaptic strength.

    McMahon AC, Barnett MW, O'Leary TS, Stoney PN, Collins MO, Papadia S, Choudhary JS, Komiyama NH, Grant SG, Hardingham GE, Wyllie DJ and Kind PC

    Centre for Integrative Physiology, University of Edinburgh, Edinburgh EH8 9XD, UK.

    Alternative promoter usage and alternative splicing enable diversification of the transcriptome. Here we demonstrate that the function of Synaptic GTPase-Activating Protein (SynGAP), a key synaptic protein, is determined by the combination of its amino-terminal sequence with its carboxy-terminal sequence. 5' rapid amplification of cDNA ends and primer extension show that different N-terminal protein sequences arise through alternative promoter usage that are regulated by synaptic activity and postnatal age. Heterogeneity in C-terminal protein sequence arises through alternative splicing. Overexpression of SynGAP α1 versus α2 C-termini-containing proteins in hippocampal neurons has opposing effects on synaptic strength, decreasing and increasing miniature excitatory synaptic currents amplitude/frequency, respectively. The magnitude of this C-terminal-dependent effect is modulated by the N-terminal peptide sequence. This is the first demonstration that activity-dependent alternative promoter usage can change the function of a synaptic protein at excitatory synapses. Furthermore, the direction and degree of synaptic modulation exerted by different protein isoforms from a single gene locus is dependent on the combination of differential promoter usage and alternative splicing.

    Funded by: Medical Research Council: G0300466, G0601584, G0700967, G0902044, G0902044(94018); Wellcome Trust

    Nature communications 2012;3;900

2011 Publications

  • Coordinating cell cycle progression via cyclin specificity.

    Pagliuca FW, Collins MO and Choudhary JS

    Cell cycle (Georgetown, Tex.) 2011;10;24;4195-6

  • APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.

    Mansfeld J, Collin P, Collins MO, Choudhary JS and Pines J

    The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z

    Nature cell biology 2011;13;10;1234-43

  • Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.

    Chaudhuri RR, Yu L, Kanji A, Perkins TT, Gardner PP, Choudhary J, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.

    Funded by: Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z

    Microbiology (Reading, England) 2011;157;Pt 10;2922-32

  • Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.

    Pagliuca FW, Collins MO, Lichawska A, Zegerman P, Choudhary JS and Pines J

    The Gurdon Institute, University of Cambridge, Cambridge, UK.

    Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.

    Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z

    Molecular cell 2011;43;3;406-17

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).

    Funded by: Cancer Research UK; Wellcome Trust: 077198

    Genome research 2011;21;5;756-67

  • Citrobacter rodentium is an unstable pathogen showing evidence of significant genomic flux.

    Petty NK, Feltwell T, Pickard D, Clare S, Toribio AL, Fookes M, Roberts K, Monson R, Nair S, Kingsley RA, Bulgin R, Wiles S, Goulding D, Keane T, Corton C, Lennard N, Harris D, Willey D, Rance R, Yu L, Choudhary JS, Churcher C, Quail MA, Parkhill J, Frankel G, Dougan G, Salmond GP and Thomson NR

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

    Citrobacter rodentium is a natural mouse pathogen that causes attaching and effacing (A/E) lesions. It shares a common virulence strategy with the clinically significant human A/E pathogens enteropathogenic E. coli (EPEC) and enterohaemorrhagic E. coli (EHEC) and is widely used to model this route of pathogenesis. We previously reported the complete genome sequence of C. rodentium ICC168, where we found that the genome displayed many characteristics of a newly evolved pathogen. In this study, through PFGE, sequencing of isolates showing variation, whole genome transcriptome analysis and examination of the mobile genetic elements, we found that, consistent with our previous hypothesis, the genome of C. rodentium is unstable as a result of repeat-mediated, large-scale genome recombination and because of active transposition of mobile genetic elements such as the prophages. We sequenced an additional C. rodentium strain, EX-33, to reveal that the reference strain ICC168 is representative of the species and that most of the inactivating mutations were common to both isolates and likely to have occurred early on in the evolution of this pathogen. We draw parallels with the evolution of other bacterial pathogens and conclude that C. rodentium is a recently evolved pathogen that may have emerged alongside the development of inbred mice as a model for human disease.

    Funded by: Biotechnology and Biological Sciences Research Council; Medical Research Council; Wellcome Trust

    PLoS pathogens 2011;7;4;e1002018

  • Characterization of the proteome, diseases and evolution of the human postsynaptic density.

    Bayés A, van de Lagemaat LN, Collins MO, Croning MD, Whittle IR, Choudhary JS and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.

    We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.

    Funded by: Chief Scientist Office: CZB/4/486; Medical Research Council: G0802238, G0802238(89569); Wellcome Trust: 066717, 077155

    Nature neuroscience 2011;14;1;19-21

2010 Publications

  • Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.

    Tee WW, Pardo M, Theunissen TW, Yu L, Choudhary JS, Hajkova P and Surani MA

    Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.

    Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.

    Funded by: Medical Research Council: G0800784; Wellcome Trust

    Genes & development 2010;24;24;2772-7

  • A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.

    Pickard D, Toribio AL, Petty NK, van Tonder A, Yu L, Goulding D, Barrell B, Rance R, Harris D, Wetter M, Wain J, Choudhary J, Thomson N and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk

    A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;21;5746-54

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk

    The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.

    Funded by: Medical Research Council: MC_U105185859; Wellcome Trust

    Cell stem cell 2010;6;4;382-95

  • Cross species proteomics.

    Wright JC, Beynon RJ and Hubbard SJ

    Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK.

    Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F004605/1

    Methods in molecular biology (Clifton, N.J.) 2010;604;123-35

  • Scoring and validation of tandem MS peptide identification methods.

    Brosch M and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.

    Methods in molecular biology (Clifton, N.J.) 2010;604;43-53

2009 Publications

  • Cell biology. Evolving cell signals.

    Collins MO

    Proteomic Mass Spectrometry Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. moc@sanger.ac.uk

    Science (New York, N.Y.) 2009;325;5948;1635-6

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. tl2@sanger.ac.uk

    Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;191;17;5377-86

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

    Funded by: Wellcome Trust

    PLoS genetics 2009;5;7;e1000569

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.

    Funded by: Wellcome Trust: 077198

    Journal of proteome research 2009;8;6;3176-81

  • Recent developments in proteome informatics for mass spectrometry analysis.

    Wright JC and Hubbard SJ

    Faculty of Life Sciences, University of Manchester, Manchester M139PT, UK.

    Mass spectrometry has become the pre-eminent analytical method for the study of proteins and proteomes in post-genome science. The high volumes of complex spectra and data generated from such experiments represent new challenges for the field of bioinformatics. The past decade has seen an explosion of informatics tools targeted towards the processing, analysis, storage, and integration of mass spectrometry based proteomic data. In this review, some of the more recent developments in proteome informatics will be discussed. This includes new tools for predicting the properties of proteins and peptides which can be exploited in experimental proteomic design, and tools for the identification of peptides and proteins from their mass spectra. Similarly, informatics approaches are required for the move towards quantitative proteomics which are also briefly discussed. Finally, the growing number of proteomic data repositories and emerging data standards developed for the field are highlighted. These tools and technologies point the way towards the next phase of experimental proteomics and informatics challenges that the proteomics community will face.

    Combinatorial chemistry & high throughput screening 2009;12;2;194-202

  • Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Wright JC, Sugden D, Francis-McIntyre S, Riba-Garcia I, Gaskell SJ, Grigoriev IV, Baker SE, Beynon RJ and Hubbard SJ

    Dept Veterinary Preclinical Sciences, University of Liverpool, Liverpool, UK. james.wright@manchester.ac.uk

    Background: Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).

    Results: 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.

    Conclusion: This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D006996/1, CFB17723

    BMC genomics 2009;10;61

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Genes to Cognition, Wellcome Trust Sanger Institute, Cambridgeshire, UK.

    The mammalian postsynaptic density (PSD) comprises a complex collection of approximately 1100 proteins. Despite extensive knowledge of individual proteins, the overall organization of the PSD is poorly understood. Here, we define maps of molecular circuitry within the PSD based on phosphorylation of postsynaptic proteins. Activation of a single neurotransmitter receptor, the N-methyl-D-aspartate receptor (NMDAR), changed the phosphorylation status of 127 proteins. Stimulation of ionotropic and metabotropic glutamate receptors and dopamine receptors activated overlapping networks with distinct combinatorial phosphorylation signatures. Using peptide array technology, we identified specific phosphorylation motifs and switching mechanisms responsible for the integration of neurotransmitter receptor pathways and their coordination of multiple substrates in these networks. These combinatorial networks confer high information-processing capacity and functional diversity on synapses, and their elucidation may provide new insights into disease mechanisms and new opportunities for drug discovery.

    Funded by: Medical Research Council: G90/93; Wellcome Trust: 066717

    Science signaling 2009;2;68;ra19

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.

    The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.

    Funded by: Wellcome Trust

    Molecular systems biology 2009;5;269

2008 Publications

  • Mapping multiprotein complexes by affinity purification and mass spectrometry.

    Collins MO and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

    The combination of affinity purification and tandem mass spectrometry (MS) has emerged as a powerful approach to delineate biological processes. In particular, the use of epitope tags has allowed this approach to become scaleable and has bypassed difficulties associated with generation of antibodies. Single epitope tags and tandem affinity purification (TAP) tags have been used to systematically map protein complexes generating protein interaction data at a near proteome-wide scale. Recent developments in the design of tags, optimisation of purification conditions, experimental design and data analysis have greatly improved the sensitivity and specificity of this approach. Concomitant developments in MS, including high accuracy and high-throughput instrumentation together with quantitative MS methods, have facilitated large-scale and comprehensive analysis of multiprotein complexes.

    Current opinion in biotechnology 2008;19;4;324-30

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.

    Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.

    Funded by: Medical Research Council: G90/112, G90/93; Wellcome Trust: 077155

    Nature neuroscience 2008;11;7;799-806

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.

    We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.

    Funded by: Wellcome Trust

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.

    Brosch M, Swamy S, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    It is a major challenge to develop effective sequence database search algorithms to translate molecular weight and fragment mass information obtained from tandem mass spectrometry into high quality peptide and protein assignments. We investigated the peptide identification performance of Mascot and X!Tandem for mass tolerance settings common for low and high accuracy mass spectrometry. We demonstrated that sensitivity and specificity of peptide identification can vary substantially for different mass tolerance settings, but this effect was more significant for Mascot. We present an adjusted Mascot threshold, which allows the user to freely select the best trade-off between sensitivity and specificity. The adjusted Mascot threshold was compared with the default Mascot and X!Tandem scoring thresholds and shown to be more sensitive at the same false discovery rates for both low and high accuracy mass spectrometry data.

    Funded by: Wellcome Trust: 077198

    Molecular & cellular proteomics : MCP 2008;7;5;962-70

  • Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.

    Pickard D, Thomson NR, Baker S, Wain J, Pardo M, Goulding D, Hamlin N, Choudhary J, Threfall J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk

    Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.

    Funded by: Wellcome Trust

    Journal of bacteriology 2008;190;7;2580-7

Team

Team members

Jyoti Choudhary
Head of Mass Spectrometry
Mark Collins
moc@sanger.ac.ukunknown
Mercedes Pardo Calvo
Senior Staff Scientist
Chris Schlaffner
cs25@sanger.ac.ukPhD Student/Data Analyst
Hendrik Weisser
hw5@sanger.ac.ukSenior Bioinformatician
James Wright
jw13@sanger.ac.ukSenior Bioinformatician
Lu Yu
Senior Staff Scientist

Jyoti Choudhary

- Head of Mass Spectrometry

She received her Ph.D. from the Imperial College, London, in the Biological Mass Spectrometry group of Prof. Howard Morris. She continued her research as a post-doctoral fellow by developing methods to purify and characterise membrane protein complexes by mass spectrometry. In 1997 she joined the Bioanalytical Sciences division in GlaxoWellcome and was then recruited to the CellMap project, which was founded to pursue the development of proteomics technologies and investigate their value in drug discovery. This unit was spun out of GlaxoSmithKline, and she became a founding member of Cellzome AG, in the UK.

Research

Dr. Choudhary’s research group at the Sanger Institute, Cambridge UK, is focused on developing and applying biochemical and analytical methods for proteomics applications.

Mark Collins

moc@sanger.ac.uk unknown

I graduated with a Joint Honours degree in Biochemistry and Molecular Genetics from University College Dublin in 2000, during which I gained laboratory experience at the Johns Hopkins University School of Medicine. I spent a year working at the Centre for Liver Disease at the Mater Misericordiae hospital in Dublin before pursuing a PhD in Molecular Neuroscience at the University of Edinburgh under the supervision of Prof. Seth Grant. During my PhD I exploited and developed emerging biochemical approaches to characterise the mammalian postsynaptic proteome in terms of its components, post-translational modifications and organisation into multi-protein complexes.

Research

Since joining the Proteomic Mass Spectrometry group in 2005, I have combined my expertise in biochemistry with state of the art mass spectrometry to tackle a range of biological problems. My research interests encompass comprehensive proteome interrogation and quantification, post-translational modifications and protein complexes. I am particularly interested in developing and applying novel methods to enrich for modified proteins/peptides (phosphorylation, palmitoylation) and large-scale quantitative analysis of PTM’s in perturbation experiments. In addition, I have a long-standing interest in dissecting post-synaptic protein complexes using combinations of peptide-affinity and tandem affinity purification with stable isotope labelling strategies for differential quantification.

References

  • Analysis of protein palmitoylation reveals a pervasive role in Plasmodium development and pathogenesis.

    Jones ML, Collins MO, Goulding D, Choudhary JS and Rayner JC

    Malaria Programme, The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    Asexual stage Plasmodium falciparum replicates and undergoes a tightly regulated developmental process in human erythrocytes. One mechanism involved in the regulation of this process is posttranslational modification (PTM) of parasite proteins. Palmitoylation is a PTM in which cysteine residues undergo a reversible lipid modification, which can regulate target proteins in diverse ways. Using complementary palmitoyl protein purification approaches and quantitative mass spectrometry, we examined protein palmitoylation in asexual-stage P. falciparum parasites and identified over 400 palmitoylated proteins, including those involved in cytoadherence, drug resistance, signaling, development, and invasion. Consistent with the prevalence of palmitoylated proteins, palmitoylation is essential for P. falciparum asexual development and influences erythrocyte invasion by directly regulating the stability of components of the actin-myosin invasion motor. Furthermore, P. falciparum uses palmitoylation in diverse ways, stably modifying some proteins while dynamically palmitoylating others. Palmitoylation therefore plays a central role in regulating P. falciparum blood stage development.

    Funded by: Wellcome Trust: 079643/Z/06/Z, 089084

    Cell host & microbe 2012;12;2;246-58

  • Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

    Wright JC, Collins MO, Yu L, Käll L, Brosch M and Choudhary JS

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

    Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Molecular & cellular proteomics : MCP 2012;11;8;478-91

  • A Plasmodium calcium-dependent protein kinase controls zygote development and transmission by translationally activating repressed mRNAs.

    Sebastian S, Brochet M, Collins MO, Schwach F, Jones ML, Goulding D, Rayner JC, Choudhary JS and Billker O

    Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    Calcium-dependent protein kinases (CDPKs) play key regulatory roles in the life cycle of the malaria parasite, but in many cases their precise molecular functions are unknown. Using the rodent malaria parasite Plasmodium berghei, we show that CDPK1, which is known to be essential in the asexual blood stage of the parasite, is expressed in all life stages and is indispensable during the sexual mosquito life-cycle stages. Knockdown of CDPK1 in sexual stages resulted in developmentally arrested parasites and prevented mosquito transmission, and these effects were independent of the previously proposed function for CDPK1 in regulating parasite motility. In-depth translational and transcriptional profiling of arrested parasites revealed that CDPK1 translationally activates mRNA species in the developing zygote that in macrogametes remain repressed via their 3' and 5'UTRs. These findings indicate that CDPK1 is a multifunctional protein that translationally regulates mRNAs to ensure timely and stage-specific protein expression.

    Funded by: Medical Research Council: G0501670; Wellcome Trust: 079643/Z/06/Z, WT098051

    Cell host & microbe 2012;12;1;9-19

  • APC15 drives the turnover of MCC-CDC20 to make the spindle assembly checkpoint responsive to kinetochore attachment.

    Mansfeld J, Collin P, Collins MO, Choudhary JS and Pines J

    The Gurdon Institute and Department of Zoology, Tennis Court Road, Cambridge CB2 1QN, UK.

    Faithful chromosome segregation during mitosis depends on the spindle assembly checkpoint (SAC), which monitors kinetochore attachment to the mitotic spindle. Unattached kinetochores generate mitotic checkpoint proteins complexes (MCCs) that bind and inhibit the anaphase-promoting complex, or cyclosome (APC/C). How the SAC proficiently inhibits the APC/C but still allows its rapid activation when the last kinetochore attaches to the spindle is important for the understanding of how cells maintain genomic stability. We show that the APC/C subunit APC15 is required for the turnover of the APC/C co-activator CDC20 and release of MCCs during SAC signalling but not for APC/C activity per se. In the absence of APC15, MCCs and ubiquitylated CDC20 remain 'locked' onto the APC/C, which prevents the ubiquitylation and degradation of cyclin B1 when the SAC is satisfied. We conclude that APC15 mediates the constant turnover of CDC20 and MCCs on the APC/C to allow the SAC to respond to the attachment state of kinetochores.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/G001537/1; Cancer Research UK: A3211; Wellcome Trust: 079643/Z/06/Z

    Nature cell biology 2011;13;10;1234-43

  • Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery.

    Pagliuca FW, Collins MO, Lichawska A, Zegerman P, Choudhary JS and Pines J

    The Gurdon Institute, University of Cambridge, Cambridge, UK.

    Cyclin-dependent kinases comprise the conserved machinery that drives progress through the cell cycle, but how they do this in mammalian cells is still unclear. To identify the mechanisms by which cyclin-cdks control the cell cycle, we performed a time-resolved analysis of the in vivo interactors of cyclins E1, A2, and B1 by quantitative mass spectrometry. This global analysis of context-dependent protein interactions reveals the temporal dynamics of cyclin function in which networks of cyclin-cdk interactions vary according to the type of cyclin and cell-cycle stage. Our results explain the temporal specificity of the cell-cycle machinery, thereby providing a biochemical mechanism for the genetic requirement for multiple cyclins in vivo and reveal how the actions of specific cyclins are coordinated to control the cell cycle. Furthermore, we identify key substrates (Wee1 and c15orf42/Sld3) that reveal how cyclin A is able to promote both DNA replication and mitosis.

    Funded by: Cancer Research UK: A7397; Wellcome Trust: 079643/Z/06/Z

    Molecular cell 2011;43;3;406-17

  • Characterization of the proteome, diseases and evolution of the human postsynaptic density.

    Bayés A, van de Lagemaat LN, Collins MO, Croning MD, Whittle IR, Choudhary JS and Grant SG

    Genes to Cognition Programme, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, UK.

    We isolated the postsynaptic density from human neocortex (hPSD) and identified 1,461 proteins. hPSD mutations cause 133 neurological and psychiatric diseases and were enriched in cognitive, affective and motor phenotypes underpinned by sets of genes. Strong protein sequence conservation in mammalian lineages, particularly in hub proteins, indicates conserved function and organization in primate and rodent models. The hPSD is an important structure for nervous system disease and behavior.

    Funded by: Chief Scientist Office: CZB/4/486; Medical Research Council: G0802238, G0802238(89569); Wellcome Trust: 066717, 077155

    Nature neuroscience 2011;14;1;19-21

  • Cell biology. Evolving cell signals.

    Collins MO

    Proteomic Mass Spectrometry Group, Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. moc@sanger.ac.uk

    Science (New York, N.Y.) 2009;325;5948;1635-6

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Genes to Cognition Programme, The Wellcome Trust Sanger Institute, Cambridge, UK.

    The molecular complexity of mammalian proteomes demands new methods for mapping the organization of multiprotein complexes. Here, we combine mouse genetics and proteomics to characterize synapse protein complexes and interaction networks. New tandem affinity purification (TAP) tags were fused to the carboxyl terminus of PSD-95 using gene targeting in mice. Homozygous mice showed no detectable abnormalities in PSD-95 expression, subcellular localization or synaptic electrophysiological function. Analysis of multiprotein complexes purified under native conditions by mass spectrometry defined known and new interactors: 118 proteins comprising crucial functional components of synapses, including glutamate receptors, K+ channels, scaffolding and signaling proteins, were recovered. Network clustering of protein interactions generated five connected clusters, with two clusters containing all the major ionotropic glutamate receptors and one cluster with voltage-dependent K+ channels. Annotation of clusters with human disease associations revealed that multiple disorders map to the network, with a significant correlation of schizophrenia within the glutamate receptor clusters. This targeted TAP tagging strategy is generally applicable to mammalian proteomics and systems biology approaches to disease.

    Funded by: Wellcome Trust

    Molecular systems biology 2009;5;269

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Institute for Science and Technology in Medicine, Keele University, Thornburrow Drive, Hartshill, Stoke-on-Trent ST4 7QB, UK.

    Understanding the origins and evolution of synapses may provide insight into species diversity and the organization of the brain. Using comparative proteomics and genomics, we examined the evolution of the postsynaptic density (PSD) and membrane-associated guanylate kinase (MAGUK)-associated signaling complexes (MASCs) that underlie learning and memory. PSD and MASC orthologs found in yeast carry out basic cellular functions to regulate protein synthesis and structural plasticity. We observed marked changes in signaling complexity at the yeast-metazoan and invertebrate-vertebrate boundaries, with an expansion of key synaptic components, notably receptors, adhesion/cytoskeletal proteins and scaffold proteins. A proteomic comparison of Drosophila and mouse MASCs revealed species-specific adaptation with greater signaling complexity in mouse. Although synaptic components were conserved amongst diverse vertebrate species, mapping mRNA and protein expression in the mouse brain showed that vertebrate-specific components preferentially contributed to differences between brain regions. We propose that the evolution of synapse complexity around a core proto-synapse has contributed to invertebrate-vertebrate differences and to brain specialization.

    Funded by: Medical Research Council: G90/112, G90/93; Wellcome Trust: 077155

    Nature neuroscience 2008;11;7;799-806

  • Proteomic analysis of in vivo phosphorylated synaptic proteins.

    Collins MO, Yu L, Coba MP, Husi H, Campuzano I, Blackstock WP, Choudhary JS and Grant SG

    Division of Neuroscience, University of Edinburgh, Edinburgh EH8 9JZ, UK.

    In the nervous system, protein phosphorylation is an essential feature of synaptic function. Although protein phosphorylation is known to be important for many synaptic processes and in disease, little is known about global phosphorylation of synaptic proteins. Heterogeneity and low abundance make protein phosphorylation analysis difficult, particularly for mammalian tissue samples. Using a new approach, combining both protein and peptide immobilized metal affinity chromatography and mass spectrometry data acquisition strategies, we have produced the first large scale map of the mouse synapse phosphoproteome. We report over 650 phosphorylation events corresponding to 331 sites (289 have been unambiguously assigned), 92% of which are novel. These represent 79 proteins, half of which are novel phosphoproteins, and include several highly phosphorylated proteins such as MAP1B (33 sites) and Bassoon (30 sites). An additional 149 candidate phosphoproteins were identified by profiling the composition of the protein immobilized metal affinity chromatography enrichment. All major synaptic protein classes were observed, including components of important pre- and postsynaptic complexes as well as low abundance signaling proteins. Bioinformatic and in vitro phosphorylation assays of peptide arrays suggest that a small number of kinases phosphorylate many proteins and that each substrate is phosphorylated by many kinases. These data substantially increase existing knowledge of synapse protein phosphorylation and support a model where the synapse phosphoproteome is functionally organized into a highly interconnected signaling network.

    The Journal of biological chemistry 2005;280;7;5972-82

Mercedes Pardo Calvo

- Senior Staff Scientist

I graduated with Honours in Pharmacy and then completed a PhD in Microbiology at Universidad Complutense de Madrid in 2000 under the supervision of Drs Gil and Nombela, also spending four months at McGill University. My PhD research explored yeast cell wall biogenesis using proteomics, genetics and cell biology. I then did postdoctoral research in CRUK London Research Institute under the supervision of Sir Paul Nurse, using fission yeast as model system. I combined genetics and cell biology to characterize the role of the microtubule cytoskeleton during cytokinesis and identify novel regulators of its organization and dynamics.

Research

I joined the Proteomic Mass Spectrometry group in 2004 setting out to characterize protein interactions using affinity purification and mass spectrometry. In collaboration with the Skarnes and Bradley labs I developed the endogenous TAP (tandem affinity purification) technology in mouse embryonic stem cells, applying it to study chromatin-associated proteins regulating stem cell biology. I have recently shifted my interest to enzymes that introduce less well-known protein modifications. Other areas of interest include lncRNAs-protein interactions. I am also involved in the Wellcome Trust Advanced Courses, teaching TAP in the Genome-wide Approaches with Fission Yeast and Protein Interactions and Networks courses.

References

  • Mechanisms controlling the temporal degradation of Nek2A and Kif18A by the APC/C-Cdc20 complex.

    Sedgwick GG, Hayward DG, Di Fiore B, Pardo M, Yu L, Pines J and Nilsson J

    The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

    The Anaphase Promoting Complex/Cyclosome (APC/C) in complex with its co-activator Cdc20 is responsible for targeting proteins for ubiquitin-mediated degradation during mitosis. The activity of APC/C-Cdc20 is inhibited during prometaphase by the Spindle Assembly Checkpoint (SAC) yet certain substrates escape this inhibition. Nek2A degradation during prometaphase depends on direct binding of Nek2A to the APC/C via a C-terminal MR dipeptide but whether this motif alone is sufficient is not clear. Here, we identify Kif18A as a novel APC/C-Cdc20 substrate and show that Kif18A degradation depends on a C-terminal LR motif. However in contrast to Nek2A, Kif18A is not degraded until anaphase showing that additional mechanisms contribute to Nek2A degradation. We find that dimerization via the leucine zipper, in combination with the MR motif, is required for stable Nek2A binding to and ubiquitination by the APC/C. Nek2A and the mitotic checkpoint complex (MCC) have an overlap in APC/C subunit requirements for binding and we propose that Nek2A binds with high affinity to apo-APC/C and is degraded by the pool of Cdc20 that avoids inhibition by the SAC.

    Funded by: Cancer Research UK: 13678; Wellcome Trust: 079643/Z/06/Z, 092096

    The EMBO journal 2013;32;2;303-14

  • Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation.

    Wilson CH, Crombie C, van der Weyden L, Poulogiannis G, Rust AG, Pardo M, Gracia T, Yu L, Choudhary J, Poulin GB, McIntyre RE, Winton DJ, March HN, Arends MJ, Fraser AG and Adams DJ

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.

    Funded by: Cancer Research UK: 13031; Medical Research Council: G0600127; Wellcome Trust

    The EMBO journal 2012;31;11;2486-97

  • Assignment of protein interactions from affinity purification/mass spectrometry data.

    Pardo M and Choudhary JS

    Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus, Hinxton, CB10 1SA Cambridgeshire, United Kingdom. mp3@sanger.ac.uk

    The combination of affinity purification with mass spectrometry analysis has become the method of choice for protein complex characterization. With the improved performance of mass spectrometry technology, the sensitivity of the analyses is increasing, probing deeper into molecular interactions and yielding longer lists of proteins. These identify not only core complex subunits but also the more inaccessible proteins that interact weakly or transiently. Alongside them, contaminant proteins, which are often abundant proteins in the cell, tend to be recovered in affinity experiments because they bind nonspecifically and with low affinity to matrix, tag, and/or antibody. The challenge now lies in discriminating nonspecific binders from true interactors, particularly at the low level and in a larger scale. This review aims to summarize the variety of methods that have been used to distinguish contaminants from specific interactions in the past few years, ranging from manual elimination using heuristic rules to more sophisticated probabilistic scoring approaches. We aim to give awareness on the processing that takes place before an interaction list is reported and on the different types of list curation approaches suited to the different experiments.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Journal of proteome research 2012;11;3;1462-74

  • Prmt5 is essential for early mouse development and acts in the cytoplasm to maintain ES cell pluripotency.

    Tee WW, Pardo M, Theunissen TW, Yu L, Choudhary JS, Hajkova P and Surani MA

    Wellcome Trust, Cancer Research UK, Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge CB2 1QN, United Kingdom.

    Prmt5, an arginine methyltransferase, has multiple roles in germ cells, and possibly in pluripotency. Here we show that loss of Prmt5 function is early embryonic-lethal due to the abrogation of pluripotent cells in blastocysts. Prmt5 is also up-regulated in the cytoplasm during the derivation of embryonic stem (ES) cells together with Stat3, where they persist to maintain pluripotency. Prmt5 in association with Mep50 methylates cytosolic histone H2A (H2AR3me2s) to repress differentiation genes in ES cells. Loss of Prmt5 or Mep50 results in derepression of differentiation genes, indicating the significance of the Prmt5/Mep50 complex for pluripotency, which may occur in conjunction with the leukemia inhibitory factor (LIF)/Stat3 pathway.

    Funded by: Medical Research Council: G0800784; Wellcome Trust

    Genes & development 2010;24;24;2772-7

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk

    The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.

    Funded by: Medical Research Council: MC_U105185859; Wellcome Trust

    Cell stem cell 2010;6;4;382-95

  • Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.

    Pickard D, Thomson NR, Baker S, Wain J, Pardo M, Goulding D, Hamlin N, Choudhary J, Threfall J and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk

    Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.

    Funded by: Wellcome Trust

    Journal of bacteriology 2008;190;7;2580-7

  • Genetic and proteomic evidences support the localization of yeast enolase in the cell surface.

    López-Villar E, Monteoliva L, Larsen MR, Sachon E, Shabaz M, Pardo M, Pla J, Gil C, Roepstorff P and Nombela C

    Departamento de Microbiología II, Facultad de Farmacia, UCM, Madrid, Spain.

    Although enolase, other glycolytic enzymes, and a variety of cytoplasmic proteins lacking an N-terminal secretion signal have been widely described as located at the cell surface in yeast and in mammalian cells, their presence in this external location is still controversial. Here, we report that different experimental approaches (genetics, cellular biology and proteomics) show that yeast enolase can reach the cell surface and describe the protein regions involved in its cell surface targeting. Hybrid enolase truncates, fused at their C terminus with the yeast internal invertase or green fluorescent protein (GFP) as reporter proteins, proved that the 169 N-terminal amino acids are sufficient to target the protein to the cell surface. Furthermore, the enolase-GFP fusion co-localized with a plasma membrane marker. Enolase was also identified among membrane proteins obtained by a purification protocol that includes sodium carbonate to prevent cytoplasmic contamination. These proteins were analyzed by SDS-PAGE, trypsin digestion and LC-MS/MS for peptide identification. Elongation factors, mitochondrial membrane proteins and a mannosyltransferase involved in cell wall mannan biosynthesis were also identified in this fraction.

    Proteomics 2006;6 Suppl 1;S107-18

  • The nuclear rim protein Amo1 is required for proper microtubule cytoskeleton organisation in fission yeast.

    Pardo M and Nurse P

    Cell Cycle Laboratory, Cancer Research UK, 44 Lincoln's Inn Fields, London, WC2A 3PX, UK. mp3@sanger.ac.uk

    Microtubules have a central role in cell division and cell polarity in eukaryotic cells. The fission yeast is a useful organism for studying microtubule regulation owing to the highly organised nature of its microtubular arrays. To better understand microtubule dynamics and organisation we carried out a screen that identified over 30 genes whose overexpression resulted in microtubule cytoskeleton abnormalities. Here we describe a novel nucleoporin-like protein, Amo1, identified in this screen. Amo1 localises to the nuclear rim in a punctate pattern that does not overlap with nuclear pore complex components. Amo1Delta cells are bent, and they have fewer microtubule bundles that curl around the cell ends. The microtubules in amo1Delta cells have longer dwelling times at the cell tips, and grow in an uncoordinated fashion. Lack of Amo1 also causes a polarity defect. Amo1 is not required for the microtubule loading of several factors affecting microtubule dynamics, and does not seem to be required for nuclear pore function.

    Journal of cell science 2005;118;Pt 8;1705-14

  • PST1 and ECM33 encode two yeast cell surface GPI proteins important for cell wall integrity.

    Pardo M, Monteoliva L, Vázquez P, Martínez R, Molero G, Nombela C and Gil C

    Departamento de Microbiología II, Facultad de Farmacia, Universidad Complutense, Pza. Ramón y Cajal s/n, 28040 Madrid, Spain.

    Pst1p was previously identified as a protein secreted by yeast regenerating protoplasts, which suggests a role in cell wall construction. ECM33 encodes a protein homologous to Pst1p, and both of them display typical features of GPI-anchored proteins and a characteristic receptor L-domain. Pst1p and Ecm33p are both localized to the cell surface, Pst1p being at the cell membrane and possibly also in the periplasmic space. Here, the characterization of pst1Delta, ecm33Delta and pst1Delta ecm33Delta mutants is described. Deletion of ECM33 leads to a weakened cell wall, and this defect is further aggravated by simultaneous deletion of PST1. As a result, the ecm33Delta mutant displays increased levels of activated Slt2p, the MAP kinase of the cell integrity pathway, and relies on a functional Slt2-mediated cell integrity pathway to ensure viability. Analyses of model glycosylated proteins show glycosylation defects in the ecm33Delta mutant. Ecm33p is also important for proper cell wall ultrastructure organization and, furthermore, for the correct assembly of the mannoprotein outer layer of the cell wall. Pst1p seems to act in the compensatory mechanism activated upon cell wall damage and, in these conditions, may partially substitute for Ecm33p.

    Microbiology (Reading, England) 2004;150;Pt 12;4157-70

  • Equatorial retention of the contractile actin ring by microtubules during cytokinesis.

    Pardo M and Nurse P

    Cell Cycle Laboratory, Cancer Research UK London Research Institute, 44 Lincoln's Inn Fields, London WC2A 3PX, UK. mercedes.pardo@cancer.org.uk

    In most eukaryotes cytokinesis is brought about by a contractile actin ring located at the division plane. Here, in fission yeast the actin ring was found to be required to generate late-mitotic microtubular structures located at the division plane, and these in turn maintained the medial position of the actin ring. When these microtubular structures were disrupted, the actin ring migrated away from the cell middle in a membrane traffic-dependent manner, resulting in asymmetrical cell divisions that led to genomic instability. We propose that these microtubular structures contribute to a checkpoint control that retains the equatorial position of the ring when progression through cytokinesis is delayed.

    Science (New York, N.Y.) 2003;300;5625;1569-74

Chris Schlaffner

cs25@sanger.ac.uk PhD Student/Data Analyst

From 2009 to 2014, I studied bioinformatics (BSc and MSc) at the University of Applied Sciences Upper Austria in Hagenberg, Austria. As part of my undergraduate studies I did two internships. Both at the Research Institute for Molecular Pathology (IMP) in Vienna, Austria, where I started to work on mass spectrometry data analysis. In October 2013 I joined the Proteomic Mass Spectrometry group for an applied research internship as part of my Master's degree. In October 2014 I started my PhD at the University of Cambridge and continue work as a member of the group.

Research

My research interests mailny focus on improving computational methods for the analysis of mass spectrometry data. My Master's project focused on imperfect matching of spectra to identify unexpected modification and sequence variation. My PhD project and work for GENCODE will be based around proteogenomic genome annotation and characterisation, as well as looking further into post-translational modifications and sequence variation and their conservation.

Hendrik Weisser

hw5@sanger.ac.uk Senior Bioinformatician

From 2003 to 2007, I studied bioinformatics (BSc and MSc) at Saarland University in Saarbrücken, Germany. Afterwards I stayed to finish a research project at the Max Planck Institute for Informatics (Thomas Lengauer group), investigating genotype-phenotype associations in HIV. In 2008 I moved to Zürich, Switzerland, to pursue a PhD in computational proteomics at ETH Zürich's Institute of Molecular Systems Biology (Lars Malmström group/Ruedi Aebersold group). I completed my doctorate in the beginning of 2013. Since June of the same year, I have been working as a bioinformatician in the Proteomic Mass Spectrometry group.

Research

I have a broad interest in improving computational methods for the analysis of mass spectrometric data, with an emphasis on quantitative proteomics. My PhD project focused on software development for protein quantification based on label-free mass spectrometric measurements. As part of my work at the Sanger Institute, I will be developing novel algorithms, adapting existing tools, and contributing to the statistical analysis of the proteomics and related data acquired by the Proteomic Mass Spectrometry group and their collaborators.

References

  • An automated pipeline for high-throughput label-free quantitative proteomics.

    Weisser H, Nahnsen S, Grossmann J, Nilse L, Quandt A, Brauer H, Sturm M, Kenar E, Kohlbacher O, Aebersold R and Malmström L

    Department of Biology, Institute of Molecular Systems Biology, ETH Zürich , 8093 Zürich, Switzerland.

    We present a computational pipeline for the quantification of peptides and proteins in label-free LC-MS/MS data sets. The pipeline is composed of tools from the OpenMS software framework and is applicable to the processing of large experiments (50+ samples). We describe several enhancements that we have introduced to OpenMS to realize the implementation of this pipeline. They include new algorithms for centroiding of raw data, for feature detection, for the alignment of multiple related measurements, and a new tool for the calculation of peptide and protein abundances. Where possible, we compare the performance of the new algorithms to that of their established counterparts in OpenMS. We validate the pipeline on the basis of two small data sets that provide ground truths for the quantification. There, we also compare our results to those of MaxQuant and Progenesis LC-MS, two popular alternatives for the analysis of label-free data. We then show how our software can be applied to a large heterogeneous data set of 58 LC-MS/MS runs.

    Journal of proteome research 2013;12;4;1628-44

  • Streptococcus pyogenes in human plasma: adaptive mechanisms analyzed by mass spectrometry-based proteomics.

    Malmström J, Karlsson C, Nordenfelt P, Ossola R, Weisser H, Quandt A, Hansson K, Aebersold R, Malmström L and Björck L

    Department of Immunotechnology, Lund University, SE-22100 Lund, Sweden. Johan.Malmstrom@immun.lth.se

    Streptococcus pyogenes is a major bacterial pathogen and a potent inducer of inflammation causing plasma leakage at the site of infection. A combination of label-free quantitative mass spectrometry-based proteomics strategies were used to measure how the intracellular proteome homeostasis of S. pyogenes is influenced by the presence of human plasma, identifying and quantifying 842 proteins. In plasma the bacterium modifies its production of 213 proteins, and the most pronounced change was the complete down-regulation of proteins required for fatty acid biosynthesis. Fatty acids are transported by albumin (HSA) in plasma. S. pyogenes expresses HSA-binding surface proteins, and HSA carrying fatty acids reduced the amount of fatty acid biosynthesis proteins to the same extent as plasma. The results clarify the function of HSA-binding proteins in S. pyogenes and underline the power of the quantitative mass spectrometry strategy used here to investigate bacterial adaptation to a given environment.

    The Journal of biological chemistry 2012;287;2;1415-25

  • Only slight impact of predicted replicative capacity for therapy response prediction.

    Weisser H, Altmann A, Sierra S, Incardona F, Struck D, Sönnerborg A, Kaiser R, Zazzi M, Tschochner M, Walter H and Lengauer T

    Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.

    Background: Replication capacity (RC) of specific HIV isolates is occasionally blamed for unexpected treatment responses. However, the role of viral RC in response to antiretroviral therapy is not yet fully understood.

    We developed a method for predicting RC from genotype using support vector machines (SVMs) trained on about 300 genotype-RC pairs. Next, we studied the impact of predicted viral RC (pRC) on the change of viral load (VL) and CD4(+) T-cell count (CD4) during the course of therapy on about 3,000 treatment change episodes (TCEs) extracted from the EuResist integrated database. Specifically, linear regression models using either treatment activity scores (TAS), the drug combination, or pRC or any combination of these covariates were trained to predict change in VL and CD4, respectively.

    Results: The SVM models achieved a Spearman correlation (rho) of 0.54 between measured RC and pRC. The prediction of change in VL (CD4) was best at 180 (360) days, reaching a correlation of rho = 0.45 (rho = 0.27). In general, pRC was inversely correlated to drug resistance at treatment start (on average rho = -0.38). Inclusion of pRC in the linear regression models significantly improved prediction of virological response to treatment based either on the drug combination or on the TAS (t-test; p-values range from 0.0247 to 4 10(-6)) but not for the model using both TAS and drug combination. For predicting the change in CD4 the improvement derived from inclusion of pRC was not significant.

    Conclusion: Viral RC could be predicted from genotype with moderate accuracy and could slightly improve prediction of virological treatment response. However, the observed improvement could simply be a consequence of the significant correlation between pRC and drug resistance.

    PloS one 2010;5;2;e9044

James Wright

jw13@sanger.ac.uk Senior Bioinformatician

In 2000 I studied a degree in Biological and Computational Science at UMIST, including a one year placement with EST Informatics at AstraZeneca, focussing on the exploitation of microarray data. My dissertation used machine learning methods to classify genomic sequences. I then studied a master’s in Physical Methods for Bioanalysis and Post Genomic Science, investigating using domains to classify phosphatases. In 2005 I began a PhD tackling cross species proteomics using lab based and in-silico strategies with Rob Beynon at Liverpool University and Simon Hubbard at the University of Manchester. In 2009 I joined the Sanger institute.

Research

My research interests include most aspects of proteomic bioinformatics and data analysis. I am currently working on projects involving unexpected PTM detection, validation, and localisation (ModX, Turbo-SloMo), machine learning methods to improve protein identification (Mascot Percolator), label free protein quantification and GO term enrichment, and proteogenomic genome annotation and characterisation. I also provide proteomic informatics support to a wide range of internal and external proteomics projects.

References

  • The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium.

    Côté RG, Griss J, Dianes JA, Wang R, Wright JC, van den Toorn HW, van Breukelen B, Heck AJ, Hulstaert N, Martens L, Reisinger F, Csordas A, Ovelleiro D, Perez-Rivevol Y, Barsnes H, Hermjakob H and Vizcaíno JA

    Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/I024204/1; Wellcome Trust: WT085949MA

    Molecular & cellular proteomics : MCP 2012;11;12;1682-9

  • Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

    Wright JC, Collins MO, Yu L, Käll L, Brosch M and Choudhary JS

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

    Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Molecular & cellular proteomics : MCP 2012;11;8;478-91

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.

    Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).

    Funded by: Cancer Research UK; Wellcome Trust: 077198

    Genome research 2011;21;5;756-67

  • Cross species proteomics.

    Wright JC, Beynon RJ and Hubbard SJ

    Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK.

    Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/F004605/1

    Methods in molecular biology (Clifton, N.J.) 2010;604;123-35

  • Recent developments in proteome informatics for mass spectrometry analysis.

    Wright JC and Hubbard SJ

    Faculty of Life Sciences, University of Manchester, Manchester M139PT, UK.

    Mass spectrometry has become the pre-eminent analytical method for the study of proteins and proteomes in post-genome science. The high volumes of complex spectra and data generated from such experiments represent new challenges for the field of bioinformatics. The past decade has seen an explosion of informatics tools targeted towards the processing, analysis, storage, and integration of mass spectrometry based proteomic data. In this review, some of the more recent developments in proteome informatics will be discussed. This includes new tools for predicting the properties of proteins and peptides which can be exploited in experimental proteomic design, and tools for the identification of peptides and proteins from their mass spectra. Similarly, informatics approaches are required for the move towards quantitative proteomics which are also briefly discussed. Finally, the growing number of proteomic data repositories and emerging data standards developed for the field are highlighted. These tools and technologies point the way towards the next phase of experimental proteomics and informatics challenges that the proteomics community will face.

    Combinatorial chemistry & high throughput screening 2009;12;2;194-202

  • Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Wright JC, Sugden D, Francis-McIntyre S, Riba-Garcia I, Gaskell SJ, Grigoriev IV, Baker SE, Beynon RJ and Hubbard SJ

    Dept Veterinary Preclinical Sciences, University of Liverpool, Liverpool, UK. james.wright@manchester.ac.uk

    Background: Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).

    Results: 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.

    Conclusion: This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

    Funded by: Biotechnology and Biological Sciences Research Council: BB/D006996/1, CFB17723

    BMC genomics 2009;10;61

Lu Yu

- Senior Staff Scientist

After obtained a BSc and MSc in Fudan University, I worked with organic mass spectrometry at Shanghai Institute of Organic Chemistry. My PhD study was supervised by Professor Simon Gaskell at UMIST Manchester on protein epitope mapping by mass spectrometry. I joined the Cell Map Project at GSK in 1999 then Cellzome UK in 2001. During this period, I gained experience in high-throughput nano-scale LC-MS/MS analysis on protein complexes in deciphering the APP processing pathway of Alzheimer’s disease. I also implemented 2DLC-MS/MS for proteome profiling of human cellular extracts, and optimized nano-scale LC-MS/MS strategy for phosphoproteomics.

Research

I joined the Proteomic Mass Spectrometry team in early 2004. I have applied my broad expertise in the analysis of biomolecules, particularly in sample preparation, development and application of multidimensional HPLC coupled with mass spectrometry towards the characterisation and quantification of proteins and PTMs. Projects include protein complexes from mammalian cells, genome annotation, protein identification and quantification (using chemical derivatisation or label-free) of bacteria studies, and de novo peptide sequencing. I also support bioinformatics development in the team for efficient data mining and result generation and other projects, and manage and maintain mass spectrometers and allied instruments in the lab.

References

  • Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

    Wright JC, Collins MO, Yu L, Käll L, Brosch M and Choudhary JS

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

    Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

    Funded by: Wellcome Trust: 079643/Z/06/Z

    Molecular & cellular proteomics : MCP 2012;11;8;478-91

  • Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome.

    Chaudhuri RR, Yu L, Kanji A, Perkins TT, Gardner PP, Choudhary J, Maskell DJ and Grant AJ

    Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community.

    Funded by: Medical Research Council: G0801161; Wellcome Trust: 079643/Z/06/Z

    Microbiology (Reading, England) 2011;157;Pt 10;2922-32

  • A conserved acetyl esterase domain targets diverse bacteriophages to the Vi capsular receptor of Salmonella enterica serovar Typhi.

    Pickard D, Toribio AL, Petty NK, van Tonder A, Yu L, Goulding D, Barrell B, Rance R, Harris D, Wetter M, Wain J, Choudhary J, Thomson N and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Sulston Building, Hinxton, Cambridge CB10 1SA, United Kingdom. djp@sanger.ac.uk

    A number of bacteriophages have been identified that target the Vi capsular antigen of Salmonella enterica serovar Typhi. Here we show that these Vi phages represent a remarkably diverse set of phages belonging to three phage families, including Podoviridae and Myoviridae. Genome analysis facilitated the further classification of these phages and highlighted aspects of their independent evolution. Significantly, a conserved protein domain carrying an acetyl esterase was found to be associated with at least one tail fiber gene for all Vi phages, and the presence of this domain was confirmed in representative phage particles by mass spectrometric analysis. Thus, we provide a simple explanation and paradigm of how a diverse group of phages target a single key virulence antigen associated with this important human-restricted pathogen.

    Funded by: Wellcome Trust

    Journal of bacteriology 2010;192;21;5746-54

  • An expanded Oct4 interaction network: implications for stem cell biology, development, and disease.

    Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM and Choudhary J

    Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK. mp3@sanger.ac.uk

    The transcription factor Oct4 is key in embryonic stem cell identity and reprogramming. Insight into its partners should illuminate how the pluripotent state is established and regulated. Here, we identify a considerably expanded set of Oct4-binding proteins in mouse embryonic stem cells. We find that Oct4 associates with a varied set of proteins including regulators of gene expression and modulators of Oct4 function. Half of its partners are transcriptionally regulated by Oct4 itself or other stem cell transcription factors, whereas one-third display a significant change in expression upon cell differentiation. The majority of Oct4-associated proteins studied to date show an early lethal phenotype when mutated. A fraction of the human orthologs is associated with inherited developmental disorders or causative of cancer. The Oct4 interactome provides a resource for dissecting mechanisms of Oct4 function, enlightening the basis of pluripotency and development, and identifying potential additional reprogramming factors.

    Funded by: Medical Research Council: MC_U105185859; Wellcome Trust

    Cell stem cell 2010;6;4;382-95

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Microbial Pathogenesis Laboratory, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom. tl2@sanger.ac.uk

    Clostridium difficile, a major cause of antibiotic-associated diarrhea, produces highly resistant spores that contaminate hospital environments and facilitate efficient disease transmission. We purified C. difficile spores using a novel method and show that they exhibit significant resistance to harsh physical or chemical treatments and are also highly infectious, with <7 environmental spores per cm(2) reproducibly establishing a persistent infection in exposed mice. Mass spectrometric analysis identified approximately 336 spore-associated polypeptides, with a significant proportion linked to translation, sporulation/germination, and protein stabilization/degradation. In addition, proteins from several distinct metabolic pathways associated with energy production were identified. Comparison of the C. difficile spore proteome to those of other clostridial species defined 88 proteins as the clostridial spore "core" and 29 proteins as C. difficile spore specific, including proteins that could contribute to spore-host interactions. Thus, our results provide the first molecular definition of C. difficile spores, opening up new opportunities for the development of diagnostic and therapeutic approaches.

    Funded by: Wellcome Trust

    Journal of bacteriology 2009;191;17;5377-86

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    High-density, strand-specific cDNA sequencing (ssRNA-seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

    Funded by: Wellcome Trust

    PLoS genetics 2009;5;7;e1000569

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom.

    Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.

    Funded by: Wellcome Trust: 077198

    Journal of proteome research 2009;8;6;3176-81

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB101SA, United Kingdom.

    We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.

    Funded by: Wellcome Trust

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Proteomic analysis of in vivo phosphorylated synaptic proteins.

    Collins MO, Yu L, Coba MP, Husi H, Campuzano I, Blackstock WP, Choudhary JS and Grant SG

    Division of Neuroscience, University of Edinburgh, Edinburgh EH8 9JZ, UK.

    In the nervous system, protein phosphorylation is an essential feature of synaptic function. Although protein phosphorylation is known to be important for many synaptic processes and in disease, little is known about global phosphorylation of synaptic proteins. Heterogeneity and low abundance make protein phosphorylation analysis difficult, particularly for mammalian tissue samples. Using a new approach, combining both protein and peptide immobilized metal affinity chromatography and mass spectrometry data acquisition strategies, we have produced the first large scale map of the mouse synapse phosphoproteome. We report over 650 phosphorylation events corresponding to 331 sites (289 have been unambiguously assigned), 92% of which are novel. These represent 79 proteins, half of which are novel phosphoproteins, and include several highly phosphorylated proteins such as MAP1B (33 sites) and Bassoon (30 sites). An additional 149 candidate phosphoproteins were identified by profiling the composition of the protein immobilized metal affinity chromatography enrichment. All major synaptic protein classes were observed, including components of important pre- and postsynaptic complexes as well as low abundance signaling proteins. Bioinformatic and in vitro phosphorylation assays of peptide arrays suggest that a small number of kinases phosphorylate many proteins and that each substrate is phosphorylated by many kinases. These data substantially increase existing knowledge of synapse protein phosphorylation and support a model where the synapse phosphoproteome is functionally organized into a highly interconnected signaling network.

    The Journal of biological chemistry 2005;280;7;5972-82

  • The three-dimensional structure and X-ray sequence reveal that trichomaglin is a novel S-like ribonuclease.

    Gan JH, Yu L, Wu J, Xu H, Choudhary JS, Blackstock WP, Liu WY and Xia ZX

    State Key Laboratory of Bio-organic and Natural Products Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China.

    Trichomaglin is a protein isolated from root tuber of the plant Maganlin (Trichosanthes Lepiniate, Cucurbitaceae). The crystal structure of trichomaglin has been determined by multiple-isomorphous replacement and refined at 2.2 A resolution. The X-ray sequence was established, based on electron density combined with the experimentally determined N-terminal sequence, and the sequence information derived from mass spectroscopic analysis. X-ray sequence-based homolog search and the three-dimensional structure reveal that trichomaglin is a novel S-like RNase, which was confirmed by biological assay. Trichomaglin molecule contains an additional beta sheet in the HV(b) region, compared with the known plant RNase structures. Fourteen cystein residues form seven disulfide bridges, more than those in the other known structures of S- and S-like RNases. His43 and His105 are expected to be the catalytic acid and base, respectively. Four hydrosulfate ions are bound in the active site pocket, three of them mimicking the substrate binding sites.

    Structure (London, England : 1993) 2004;12;6;1015-25

* quick link - http://q.sanger.ac.uk/fuerr25l