Proteomic mass spectrometry

The dissection of molecular pathways and cellular processes encompasses a multitude of scientific disciplines and levels of complexity. To complement large scale genetic and genomic programmes at the Wellcome Trust Sanger Institute, the Proteomic mass spectrometry group focuses on biological problems that are only tractable at the level of proteins.

The protein complement of the genome is incredibly complex due to tissue, cell and sub-cellular compartment specific expression of the genome. In addition, the myriad of post-translational modifications that fundamentally regulate biological processes are much more abundant than previously estimated and the combinatorial effect of these modifications is emerging as an important mode of specificity determination in signal transduction. Our research programme develops and brings together a range of disciplines (mass spectrometry, biochemistry, molecular biology and informatics) that provide a powerful platform for proteome discovery and measurement. Coherent technology development within all of these disciplines is essential for our diverse research activities. These constitute analysis of protein-protein interactions, cell signaling and proteome dynamics in ES cells, pathogens, model organisms and human cell lines and tissues.

[ Markus Brosch, Wellcome Trust Sanger Institute]

Bioinformatics

Like most post genomic sciences Proteomics is heavily dependant of informatics. Our group apply a huge range of informatics tools and applications to our data. We collaborate with software providers to help beta test and validate new proteomics software as well as developing new algorithms on tools ourselves.

Peptide and Protein Identification

There are a myriad of software platforms for identifiying peptides and proteins from mass spectra. Within the group we mainly use Mascot (Matrix Science) for discovery proteomics. However, we also have access to other locally installed tools such as Open Mass Spectrometry Search Algoirthm (OMSSA), Sequest, Inspect, XCore, Andromeda and the Trans Proteomics Pipeline (TPP). We have a large 14 node cluster on which we can run this identification software in a high throughput manner.

Mascot Percolator

Mascot Percolator.

Mascot Percolator.
Enlarge this image (577 x 174)

Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. Lukas Käll et al. developed a well performing machine learning algorithm, called Percolator, for rescoring tandem MS database search results from Sequest [1]. We developed a software package that interfaces Mascot with Percolator [2]. It automatically extracts and computes relevant features from target/decoy Mascot search results, trains Percolator, applies the resulting classifier to each PSM and writes a result file.

Mascot percolator has been developed as a command line tool and can be readily integrated into existing pipelines or be used as a stand-aline application. A large number of features that are relevant to the quality of a PSM, such as Mascot scores, parent and fragment mass accuracy, peptide, protein as well as ion matching statistics, amongst others, were explored. We have shown that Mascot Percolator substantially outperforms previous Mascot scoring methods for high and low mass accuracy data, in the best case identifying 74 per cent and 49 per cent more unique peptides and 57 per cent and 38 per cent more proteins than using the default Mascot Identity and Homology threshold respectively.

Mascot percolator is documented and can be downloaded from this site

Identification and Localisation of Post Translational Modifications

A quickly developing area in proteomics is the detection of modified peptides from mass spectra. With the application of high resolution mass spectrometry and multiple fragmentation methods it has become much easier to detect and map peptide and protein modifications. There are a variety of informatics methods to make these detections, however, much of this software is immature and lacking in robust validation statistics. One of our groups aims is to evaluate all the currently available methods for detecting modifications in mass spectra and by assembling them into a single pipeline validate and localise detections using a universal statistical scoring system.

Peptide and Protein Quantification

As proteomics matures as a field of research there has been a shift from discovery to target proteomics where we want to know not just that a peptide or protein is present but in what quantity. There are lots of quantification methods available in proteomics however nearly all of them need some robust informatics and statistical analysis to produce any useful results.

PRIDE

Recently we have been working with the PRIDE group at the EBI to help them validate new tools to expediate the submission of proteomics data to their repository. As part of the Sanger we are committed to making our data public and collaboration with the EBI will help us achieve this.

Proteogenomics and Genome annotation

Eukaryotes Workflow.

Eukaryotes Workflow.
Enlarge this image (600 x 395)

Modern mass spectrometry instruments allow rapid collection of proteome wide datasets with high sensitivity. However, routine use of mass spectrometry data for genome annotation (experimental evidence of gene products) is currently limited by the tools available. Processing pipelines are inefficient and not optimised for genome annotation.

Existing software for automatically matching spectra to sequence databases is limited as only known gene products can be found.

Furthermore, post-translational modifications are known to be widely present but are neglected by classical interpretation software.

Prokaryotes Workflow.

Prokaryotes Workflow.
Enlarge this image (600 x 451)

This project is to develop tools and methods to address these issues.

Data processing pipelines will be constructed to allow application to large datasets. The pipeline will be used to process a variety of datasets, including externally available data, and an in-depth analysis will be carried out to compare the peptides observed with existing transcript annotations. Results will be fed into the existing projects to refine genome annotation.

Technology

Biochemistry

Enlarge this image (415 x 170)

The group has a long standing expertise in peptide and protein purification and fractionation. At the level of proteins we utilise a number of fractionation schemes to simplify the proteome including 1D PAGE, and for proteins and/or peptides we use ion exchange and reverse phase chromatography as well as Offgel (by isoelectric point) fractionation. We have numerous tools at our disposal for isolation of protein complexes and have pioneered innovative methods such as eTAP and peptide affinity purification methods as well as the application of immuno and drug-affinity methods.

TAP-MS

Pipeline for characterisation of tagged protein complexes.

Pipeline for characterisation of tagged protein complexes.

In collaboration with Allan Bradley and Bill Skarnes we have developed eTAP (endogenous Tandem Affinity Purification) tagging technology, in which the endogenous gene is modified to include two small affinity tags to enable efficient and specific recovery of protein assemblies associated with the targeted gene. We have validated this as a very useful approach for mapping protein interactions from cell lines and tissues, both for systematic large scale applications as well as for individual genes of interest. Such biochemical methods are used to isolate native protein complexes that participate in specific cellular processes 'molecular machines', as well as assemblies associated with dynamic biochemical cellular events such as signaling pathways. Composition analysis of such samples identifies the proteins associated with specific biological tasks. This strategy provides insight on molecular context and can be used to link novel genes with biological function. The ability to rapidly identify proteins is critical to this endeavor and has only recently been overcome by developments in mass spectrometry.

Liquid chromatography tandem mass spectrometry

LC-MS/MS analysis workflow.

LC-MS/MS analysis workflow.
Enlarge this image (600 x 425)

State of the art biological mass spectrometry allows multifaceted analysis of proteins and is now as a tool, approaching a proteome-scale. Mass spectrometry analysis can provide accurate and high resolution analysis of intact proteins, and more commonly peptides, derived from enzymatic digestion of proteins. In addition to mass measurement of intact proteins and peptides, gas phase fragmentation along the peptide backbone, generates fragment ion rich data, which is used to derive sequence information. The lab has two LC-MS-MS/MS instruments; an LTQ-FT Ultra (hybrid ion trap/FTICR) (Thermo) and a Q-ToF Ultima (Waters). Both are high resolution instruments fitted with current generation auto-injection online nanoscale liquid chromatography systems capable of multidimensional peptide separation. The LTQ-FT Ultra offers high accuracy (<2 ppm) precursor ion mass measurement (MS) is combined with fast fragment ion generation and mass measurement (MS/MS/MSn) and alternative fragmentation methods such as IRMPD and ECD are also available. In addition, analytical scale protein / peptide separation by high resolution chromatography Ultra pressure and monolith are available.

Quantitative proteomics

Recently, the field of proteomics has shifted from the use of just descriptive strategies, in which catalogs of proteins and PTMs were generated, to the development and use of quantitative approaches where temporal aspects of protein function can be assayed. The inherent dynamics of proteins, protein modifications and protein complexes require sensitive and comprehensive quantitative methods for their study. A number of such methods have been developed, ranging from label-free methods to chemical labeling of proteins/peptides with stable isotopic labels to stable isotope labeling with amino acids in cell culture (SILAC).

SILAC

Silac quantification of TAP tagged protein complexes.

Silac quantification of TAP tagged protein complexes.

This approach developed by the Mann lab uses isotopically labelled (13C6, 15N2 etc) amino acids (usually, Arginine and Lysine) that are substituted for their naturally occurring forms in cell culture media. Cells can be labelled to near completion with these amino acids and allowing pooling of differentially labelled samples (light/heavy pairs from whole cell lysates, organelle proteomes, protein complexes etc) for analysis by mass spectrometry. The main advantages of this approach is that samples can be pooled upstream in a proteomic workflow, thereby reducing experimental variation as well as the fact that differentially labelled peptides co-elute in liquid chromatography and therefore allow MS analysis of light and heavy peptide pairs at the same time in an LC-MS/MS experiment. We are exploiting this technology in a triple label workflow (light, medium and heavy labels) in collaboration with the Pines lab (Gurdon Institute) to assess quantitative differences in protein complexes in different cell states. We are using MaxQuant to process SILAC labelled data and we can achieve highly sensitive protein identification and quantification (1000 proteins, recalibration to achieve parts per billion mass accuracy) from an LC-MS/MS analysis of a single SDS-PAGE gel lane in less than 24 hours.

Chemical labelling

In cases where SILAC labelling is not possible (e.g. human tissue, cells that cannot be cultured etc) or cost effective (higher model organisms), alternative approaches such as chemical labelling are used. Peptides from digested proteins (solution digest/in gel digestion) can be differentially labelled using a combination of light, medium and heavy forms of formaldehyde and cyanoborohydride (Heck lab). Labelled peptides are pooled and analysed by LC-MS/MS analysis in which a 4 Da mass difference is observed between differentially labelled peptides. Quantitative information is collected continuously at the MS level at high resolution allowing accurate and sensitive quantification. We are using this chemical labelling approach to perform quantification of protein complexes as well as monitoring cell signaling dynamics in pathogens.

Post-translational modifications

Methods for purifying phosphoproteomes.

Methods for purifying phosphoproteomes.
Enlarge this image (300 x 335)

We have had a long term interest in protein phosphorylation and we have pioneered the purification of phosphoproteins using protein immobilised metal affinity chromatography (IMAC). In addition, we routinely purify phosphopeptides using IMAC and TiO2 for phosphoproteome mapping and for differential phosphoproteome analysis. We also exploit the enrichment of TAP-tagged proteins to probe protein centric primary structure, thereby identifying a range of modifications that would not be observable in normal PTM or proteome profiling experiments.

Research & Bioinformatics

Types of research projects.

Types of research projects.
Enlarge this image (450 x 453)

Our portfolio of research encompasses all levels of proteome complexity from whole organism characterisation, to sub-cellular organelles, protein complexes, PTMs and finally annotation of genomes. Within the Sanger Institute we are contributing to projects covering all programme areas: Mouse genetics, pathogen genetics, human genetics and bioinformatics (genome annotation). Mass spectrometry based approaches are being been applied to understand the function of genes in terms of molecular interactions, expression and PTMs of their encoded proteins.

EuTRACC

The Mass Spectrometry group is a principle team and contributes data to EuTRACC, a large scale international programme for mapping protein interaction networks important in stem cell regulation. In collaboration with Allan Bradley and Bill Skarnes (Mouse Developmental Genetics) we have developed the eTAP (endogenous Tandem Affinity Purification) tagging technology, which introduces an epitope tag in one of the endogenous alleles to allow the generic purification of a protein of interest and its binding partners from murine embryonic stem (ES) cells. We have validated the eTAP approach with the elucidation of proteins associated with Oct4, a major regulator of ES cell pluripotency and reprogramming, where we have generated the most comprehensive list of partners to date. This platform is applicable to high throughput analysis and we are using it to systematically study chromatin associated proteins important for pluripotency and development in ES cells. The data obtained is followed up by selecting candidates for successive rounds of tagging/TAP, as well as ChIP and phenotypical analyses, in order to unravel the regulatory protein networks that control stem cell processes.

Synapse proteomics

The Wellcome Trust funded Genes2Cognition (G2C) programme is an international consortium of scientists studying synaptic molecules and their role in behaviour and disease. Our characterisation of the NMDA receptor complex, the postsynaptic density and the synapse phosphoproteome, have provided the basis for several large-scale human and model organism programmes. Proteomics has been an important component of this programme, initially providing the list of molecules for focussed and integrated human and mouse genetic and genomic studies. Comparative analysis of MAGUK-associated signaling complexes (MASC) in different species provided novel insights into the evolutionary origins of the synapse. Subsequently, as part of the consortium we have been developing and applying state of the art proteomic methods to dissect quantitative changes in synaptic protein complexes in mutant mice as well studies of human synapse proteomes.

Proteomic analysis of chromatin proteins in ES cells

As part of the EUTRACC consortium, we have developed and validated a novel strategy for tagging of genes in ES cells for tandem-affinity purification-mass spectrometry (TAP-MS) analysis. This strategy ensures the expression of tagged proteins at endogenous levels and the generation of mice from tagged ES cells can be used to confirm the normal activity of the tagged protein. The analysis of protein:protein and protein:DNA interactions of selected chromatin proteins is currently underway.

Collaborations

Internal collaborations

External collaborations

References

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Journal of bacteriology 2009;191;17;5377-86

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    PLoS genetics 2009;5;7;e1000569

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    Journal of proteome research 2009;8;6;3176-81

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Science signaling 2009;2;68;ra19

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Molecular systems biology 2009;5;269

  • Mapping multiprotein complexes by affinity purification and mass spectrometry.

    Collins MO and Choudhary JS

    Current opinion in biotechnology 2008;19;4;324-30

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Nature neuroscience 2008;11;7;799-806

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.

    Brosch M, Swamy S, Hubbard T and Choudhary J

    Molecular & cellular proteomics : MCP 2008;7;5;962-70

  • Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.

    Pickard D, Thomson NR, Baker S, Wain J, Pardo M, Goulding D, Hamlin N, Choudhary J, Threfall J and Dougan G

    Journal of bacteriology 2008;190;7;2580-7

Team

Team members

Jyoti Choudhary
Team Leader
James Wright
Mass Spectrometry Data Analyst
Mark Collins
Senior Staff Scientist
Lu Yu
Senior Staff Scientist
Mercedes Pardo
Senior Staff Scientist
Ulrich Omasits
Mass Spectrometry Data Analyst
Enriqueta Banciella
Advanced Research Assistant
Ana Toribio
Research Associate

Jyoti Choudhary

Team Leader

Dr Jyoti Choudhary is Head of Proteomic mass spectrometry at the Wellcome Trust Sanger Institute. She received her Ph.D. from the Imperial College, London, in the Biological Mass Spectrometry group of Professor Howard Morris where she worked on developing biochemical and analytical methods for elucidating primary structure of proteins. She continued her research as a post-doctoral fellow by establishing novel methods to purify and characterise membrane protein complexes by mass spectrometry. In 1997 she joined the Bioanalytical Sciences division in GlaxoWellcome. She was selected as a group leader to the CellMap project, founded to pursue the development of proteomics technologies and investigate their value in drug discovery. This unit was spun out of GlaxoSmithKline, and she became a founding member of Cellzome AG, in the UK. She led the analytical group and contributed in establishing the TAP-MS platform for characterising mammalian protein complexes. The team used this technology to systematically characterise protein complexes of a key human biological pathway, the APP processing pathway of Alzheimer's disease. This study is one of the largest functional proteomics studies of a disease pathway, and discoveries from this program underpin therapeutics projects underway in the company.

Research

Dr Choudhary's research group at the Sanger Institute is focused on developing and applying proteomic mass spectrometry methods for studying protein interactions and cell signaling.

Publications

  • Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.

    Lawley TD, Croucher NJ, Yu L, Clare S, Sebaihia M, Goulding D, Pickard DJ, Parkhill J, Choudhary J and Dougan G

    Journal of bacteriology 2009;191;17;5377-86

  • A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, Parkhill J, Choudhary J, Thomson NR and Dougan G

    PLoS genetics 2009;5;7;e1000569

  • Accurate and sensitive peptide identification with Mascot Percolator.

    Brosch M, Yu L, Hubbard T and Choudhary J

    Journal of proteome research 2009;8;6;3176-81

  • Neurotransmitters drive combinatorial multistate postsynaptic density networks.

    Coba MP, Pocklington AJ, Collins MO, Kopanitsa MV, Uren RT, Swamy S, Croning MD, Choudhary JS and Grant SG

    Science signaling 2009;2;68;ra19

  • Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.

    Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS and Grant SG

    Molecular systems biology 2009;5;269

  • Mapping multiprotein complexes by affinity purification and mass spectrometry.

    Collins MO and Choudhary JS

    Current opinion in biotechnology 2008;19;4;324-30

  • Evolutionary expansion and anatomical specialization of synapse proteome complexity.

    Emes RD, Pocklington AJ, Anderson CN, Bayes A, Collins MO, Vickers CA, Croning MD, Malik BR, Choudhary JS, Armstrong JD and Grant SG

    Nature neuroscience 2008;11;7;799-806

  • Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.

    Collins MO, Yu L, Campuzano I, Grant SG and Choudhary JS

    Molecular & cellular proteomics : MCP 2008;7;7;1331-48

  • Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.

    Brosch M, Swamy S, Hubbard T and Choudhary J

    Molecular & cellular proteomics : MCP 2008;7;5;962-70

  • Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.

    Pickard D, Thomson NR, Baker S, Wain J, Pardo M, Goulding D, Hamlin N, Choudhary J, Threfall J and Dougan G

    Journal of bacteriology 2008;190;7;2580-7

James Christopher Wright

Computational Biologist

I initially studied an undergraduate BSc(hons) in Biological and Computational Science at UMIST. During this degree I spent one year working within EST Informatics at AstraZeneca, where I focussed on the analysis and exploitation of microarray data. My final degree dissertation attempted to use machine learning methods to classify genomic sequences and intron or exons. Following my degree I studied an MSc in Physical Methods for Bioanalysis and Post Genomic Science at the University of Manchester. My dissertation for this degree attempted to use interpro domains to classify Phosphotases an map them to ontologies. In 2005 I took up a NERC funded PhD position at Liverpool University with Prof. Rob Beynon and Dr Simon Hubbard at the University of Manchester. My PhD tackled the problems face in cross species proteomics applying both lab based and insilico methods. My thesis investigated methods fro creating species independant search databases, using proteomics to assist genome annotation in Aspergillus Niger (Proteogenomics), and the use of species independant spectral profile libraries. In 2009 I joined Jyoti Choudhary's team here at the Wellcome Trust Sanger Institute

Research

I am developing the software tool Mascot Percolator expanding it capabilities and applying it in new ways, this has lead to much research examining different mass spectrometry fragmentation types. I am also developing innovative proteomics pipelines for the identification and localisation of peptide and protein modifications.

Publications

  • Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.

    Brosch M, Saunders GI, Frankish A, Collins MO, Yu L, Wright J, Verstraten R, Adams DJ, Harrow J, Choudhary JS and Hubbard T

    Genome research 2011;21;5;756-67

  • Cross species proteomics.

    Wright JC, Beynon RJ and Hubbard SJ

    Methods in molecular biology (Clifton, N.J.) 2010;604;123-35

  • Recent developments in proteome informatics for mass spectrometry analysis.

    Wright JC and Hubbard SJ

    Combinatorial chemistry & high throughput screening 2009;12;2;194-202

  • Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Wright JC, Sugden D, Francis-McIntyre S, Riba-Garcia I, Gaskell SJ, Grigoriev IV, Baker SE, Beynon RJ and Hubbard SJ

    BMC genomics 2009;10;61

* quick link - http://q.sanger.ac.uk/fuerr25l