Bioinformatics
Like most post genomic sciences Proteomics is heavily dependant of informatics. Our group apply a huge range of informatics tools and applications to our data. We collaborate with software providers to help beta test and validate new proteomics software as well as developing new algorithms on tools ourselves.
Peptide and Protein Identification
There are a myriad of software platforms for identifiying peptides and proteins from mass spectra. Within the group we mainly use Mascot (Matrix Science) for discovery proteomics. However, we also have access to other locally installed tools such as Open Mass Spectrometry Search Algoirthm (OMSSA), Sequest, Inspect, XCore, Andromeda and the Trans Proteomics Pipeline (TPP). We have a large 14 node cluster on which we can run this identification software in a high throughput manner.
Mascot Percolator
Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. Lukas Käll et al. developed a well performing machine learning algorithm, called Percolator, for rescoring tandem MS database search results from Sequest [1]. We developed a software package that interfaces Mascot with Percolator [2]. It automatically extracts and computes relevant features from target/decoy Mascot search results, trains Percolator, applies the resulting classifier to each PSM and writes a result file.
Mascot percolator has been developed as a command line tool and can be readily integrated into existing pipelines or be used as a stand-aline application. A large number of features that are relevant to the quality of a PSM, such as Mascot scores, parent and fragment mass accuracy, peptide, protein as well as ion matching statistics, amongst others, were explored. We have shown that Mascot Percolator substantially outperforms previous Mascot scoring methods for high and low mass accuracy data, in the best case identifying 74 per cent and 49 per cent more unique peptides and 57 per cent and 38 per cent more proteins than using the default Mascot Identity and Homology threshold respectively.
Mascot percolator is documented and can be downloaded from this site
Identification and Localisation of Post Translational Modifications
A quickly developing area in proteomics is the detection of modified peptides from mass spectra. With the application of high resolution mass spectrometry and multiple fragmentation methods it has become much easier to detect and map peptide and protein modifications. There are a variety of informatics methods to make these detections, however, much of this software is immature and lacking in robust validation statistics. One of our groups aims is to evaluate all the currently available methods for detecting modifications in mass spectra and by assembling them into a single pipeline validate and localise detections using a universal statistical scoring system.
Peptide and Protein Quantification
As proteomics matures as a field of research there has been a shift from discovery to target proteomics where we want to know not just that a peptide or protein is present but in what quantity. There are lots of quantification methods available in proteomics however nearly all of them need some robust informatics and statistical analysis to produce any useful results.
PRIDE
Recently we have been working with the PRIDE group at the EBI to help them validate new tools to expediate the submission of proteomics data to their repository. As part of the Sanger we are committed to making our data public and collaboration with the EBI will help us achieve this.
Proteogenomics and Genome annotation
Modern mass spectrometry instruments allow rapid collection of proteome wide datasets with high sensitivity. However, routine use of mass spectrometry data for genome annotation (experimental evidence of gene products) is currently limited by the tools available. Processing pipelines are inefficient and not optimised for genome annotation.
Existing software for automatically matching spectra to sequence databases is limited as only known gene products can be found.
Furthermore, post-translational modifications are known to be widely present but are neglected by classical interpretation software.
This project is to develop tools and methods to address these issues.
Data processing pipelines will be constructed to allow application to large datasets. The pipeline will be used to process a variety of datasets, including externally available data, and an in-depth analysis will be carried out to compare the peptides observed with existing transcript annotations. Results will be fed into the existing projects to refine genome annotation.
Technology
Biochemistry
The group has a long standing expertise in peptide and protein purification and fractionation. At the level of proteins we utilise a number of fractionation schemes to simplify the proteome including 1D PAGE, and for proteins and/or peptides we use ion exchange and reverse phase chromatography as well as Offgel (by isoelectric point) fractionation. We have numerous tools at our disposal for isolation of protein complexes and have pioneered innovative methods such as eTAP and peptide affinity purification methods as well as the application of immuno and drug-affinity methods.
TAP-MS

Pipeline for characterisation of tagged protein complexes.
In collaboration with Allan Bradley and Bill Skarnes we have developed eTAP (endogenous Tandem Affinity Purification) tagging technology, in which the endogenous gene is modified to include two small affinity tags to enable efficient and specific recovery of protein assemblies associated with the targeted gene. We have validated this as a very useful approach for mapping protein interactions from cell lines and tissues, both for systematic large scale applications as well as for individual genes of interest. Such biochemical methods are used to isolate native protein complexes that participate in specific cellular processes 'molecular machines', as well as assemblies associated with dynamic biochemical cellular events such as signaling pathways. Composition analysis of such samples identifies the proteins associated with specific biological tasks. This strategy provides insight on molecular context and can be used to link novel genes with biological function. The ability to rapidly identify proteins is critical to this endeavor and has only recently been overcome by developments in mass spectrometry.
Liquid chromatography tandem mass spectrometry
State of the art biological mass spectrometry allows multifaceted analysis of proteins and is now as a tool, approaching a proteome-scale. Mass spectrometry analysis can provide accurate and high resolution analysis of intact proteins, and more commonly peptides, derived from enzymatic digestion of proteins. In addition to mass measurement of intact proteins and peptides, gas phase fragmentation along the peptide backbone, generates fragment ion rich data, which is used to derive sequence information. The lab has two LC-MS-MS/MS instruments; an LTQ-FT Ultra (hybrid ion trap/FTICR) (Thermo) and a Q-ToF Ultima (Waters). Both are high resolution instruments fitted with current generation auto-injection online nanoscale liquid chromatography systems capable of multidimensional peptide separation. The LTQ-FT Ultra offers high accuracy (<2 ppm) precursor ion mass measurement (MS) is combined with fast fragment ion generation and mass measurement (MS/MS/MSn) and alternative fragmentation methods such as IRMPD and ECD are also available. In addition, analytical scale protein / peptide separation by high resolution chromatography Ultra pressure and monolith are available.
Quantitative proteomics
Recently, the field of proteomics has shifted from the use of just descriptive strategies, in which catalogs of proteins and PTMs were generated, to the development and use of quantitative approaches where temporal aspects of protein function can be assayed. The inherent dynamics of proteins, protein modifications and protein complexes require sensitive and comprehensive quantitative methods for their study. A number of such methods have been developed, ranging from label-free methods to chemical labeling of proteins/peptides with stable isotopic labels to stable isotope labeling with amino acids in cell culture (SILAC).
SILAC

Silac quantification of TAP tagged protein complexes.
This approach developed by the Mann lab uses isotopically labelled (13C6, 15N2 etc) amino acids (usually, Arginine and Lysine) that are substituted for their naturally occurring forms in cell culture media. Cells can be labelled to near completion with these amino acids and allowing pooling of differentially labelled samples (light/heavy pairs from whole cell lysates, organelle proteomes, protein complexes etc) for analysis by mass spectrometry. The main advantages of this approach is that samples can be pooled upstream in a proteomic workflow, thereby reducing experimental variation as well as the fact that differentially labelled peptides co-elute in liquid chromatography and therefore allow MS analysis of light and heavy peptide pairs at the same time in an LC-MS/MS experiment. We are exploiting this technology in a triple label workflow (light, medium and heavy labels) in collaboration with the Pines lab (Gurdon Institute) to assess quantitative differences in protein complexes in different cell states. We are using MaxQuant to process SILAC labelled data and we can achieve highly sensitive protein identification and quantification (1000 proteins, recalibration to achieve parts per billion mass accuracy) from an LC-MS/MS analysis of a single SDS-PAGE gel lane in less than 24 hours.
Chemical labelling
In cases where SILAC labelling is not possible (e.g. human tissue, cells that cannot be cultured etc) or cost effective (higher model organisms), alternative approaches such as chemical labelling are used. Peptides from digested proteins (solution digest/in gel digestion) can be differentially labelled using a combination of light, medium and heavy forms of formaldehyde and cyanoborohydride (Heck lab). Labelled peptides are pooled and analysed by LC-MS/MS analysis in which a 4 Da mass difference is observed between differentially labelled peptides. Quantitative information is collected continuously at the MS level at high resolution allowing accurate and sensitive quantification. We are using this chemical labelling approach to perform quantification of protein complexes as well as monitoring cell signaling dynamics in pathogens.
Post-translational modifications
We have had a long term interest in protein phosphorylation and we have pioneered the purification of phosphoproteins using protein immobilised metal affinity chromatography (IMAC). In addition, we routinely purify phosphopeptides using IMAC and TiO2 for phosphoproteome mapping and for differential phosphoproteome analysis. We also exploit the enrichment of TAP-tagged proteins to probe protein centric primary structure, thereby identifying a range of modifications that would not be observable in normal PTM or proteome profiling experiments.
Research & Bioinformatics
Our portfolio of research encompasses all levels of proteome complexity from whole organism characterisation, to sub-cellular organelles, protein complexes, PTMs and finally annotation of genomes. Within the Sanger Institute we are contributing to projects covering all programme areas: Mouse genetics, pathogen genetics, human genetics and bioinformatics (genome annotation). Mass spectrometry based approaches are being been applied to understand the function of genes in terms of molecular interactions, expression and PTMs of their encoded proteins.
EuTRACC
The Mass Spectrometry group is a principle team and contributes data to EuTRACC, a large scale international programme for mapping protein interaction networks important in stem cell regulation. In collaboration with Allan Bradley and Bill Skarnes (Mouse Developmental Genetics) we have developed the eTAP (endogenous Tandem Affinity Purification) tagging technology, which introduces an epitope tag in one of the endogenous alleles to allow the generic purification of a protein of interest and its binding partners from murine embryonic stem (ES) cells. We have validated the eTAP approach with the elucidation of proteins associated with Oct4, a major regulator of ES cell pluripotency and reprogramming, where we have generated the most comprehensive list of partners to date. This platform is applicable to high throughput analysis and we are using it to systematically study chromatin associated proteins important for pluripotency and development in ES cells. The data obtained is followed up by selecting candidates for successive rounds of tagging/TAP, as well as ChIP and phenotypical analyses, in order to unravel the regulatory protein networks that control stem cell processes.
Synapse proteomics
The Wellcome Trust funded Genes2Cognition (G2C) programme is an international consortium of scientists studying synaptic molecules and their role in behaviour and disease. Our characterisation of the NMDA receptor complex, the postsynaptic density and the synapse phosphoproteome, have provided the basis for several large-scale human and model organism programmes. Proteomics has been an important component of this programme, initially providing the list of molecules for focussed and integrated human and mouse genetic and genomic studies. Comparative analysis of MAGUK-associated signaling complexes (MASC) in different species provided novel insights into the evolutionary origins of the synapse. Subsequently, as part of the consortium we have been developing and applying state of the art proteomic methods to dissect quantitative changes in synaptic protein complexes in mutant mice as well studies of human synapse proteomes.
Proteomic analysis of chromatin proteins in ES cells
As part of the EUTRACC consortium, we have developed and validated a novel strategy for tagging of genes in ES cells for tandem-affinity purification-mass spectrometry (TAP-MS) analysis. This strategy ensures the expression of tagged proteins at endogenous levels and the generation of mice from tagged ES cells can be used to confirm the normal activity of the tagged protein. The analysis of protein:protein and protein:DNA interactions of selected chromatin proteins is currently underway.
Collaborations
Internal collaborations
- Human genetics
- Genome dynamics and evolution group (Matthew Hurles)
- Model organisms
- Mouse developmental genetics (Bill Skarnes)
- Genes to cognition (Seth Grant)
- Pathogens
- Microbial pathogenesis (Gordon Dougan)
- Malaria programme: Billker group (Oliver Billker)
- Malaria programme: Rayner Group (Julian Rayner)
- Bioinformatics
- Vertebrate genome analysis (Tim Hubbard)
External collaborations
- Jonathon Pines Laboratory (Gurdon Institute, University of Cambridge)
- Draviam Lab (Department of Genetics, University of Cambridge)
References
-
Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.
Journal of bacteriology 2009;191;17;5377-86
PUBMED: 19542279; PMC: 2725610; DOI: 10.1128/JB.00597-09
-
A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.
PLoS genetics 2009;5;7;e1000569
PUBMED: 19609351; PMC: 2704369; DOI: 10.1371/journal.pgen.1000569
-
Accurate and sensitive peptide identification with Mascot Percolator.
Journal of proteome research 2009;8;6;3176-81
PUBMED: 19338334; PMC: 2734080; DOI: 10.1021/pr800982s
-
Neurotransmitters drive combinatorial multistate postsynaptic density networks.
Science signaling 2009;2;68;ra19
PUBMED: 19401593; PMC: 3280897; DOI: 10.1126/scisignal.2000102
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
-
Mapping multiprotein complexes by affinity purification and mass spectrometry.
Current opinion in biotechnology 2008;19;4;324-30
PUBMED: 18598764; DOI: 10.1016/j.copbio.2008.06.002
-
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Nature neuroscience 2008;11;7;799-806
PUBMED: 18536710; DOI: 10.1038/nn.2135
-
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
PUBMED: 18388127; DOI: 10.1074/mcp.M700564-MCP200
-
Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.
Molecular & cellular proteomics : MCP 2008;7;5;962-70
PUBMED: 18216375; PMC: 2656932; DOI: 10.1074/mcp.M700293-MCP200
-
Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.
Journal of bacteriology 2008;190;7;2580-7
PUBMED: 18192390; PMC: 2293211; DOI: 10.1128/JB.01654-07
Team
Team members
Members
- Team list
- Jyoti Choudhary
- James Wright
- Mark Collins
- Lu Yu
- Mercedes Pardo
- Ulrich Omasits
- Enriqueta Banciella
- Ana Toribio
Previous Members
- Vivek Iyer
- Sajani Swamy
- Markus Brosch
- Mark Bushell
- Parthiban Vijayarangakaannan
- Tannia Gracia Bustos
- Jyoti Choudhary
- Team Leader
- James Wright
- Mass Spectrometry Data Analyst
- Mark Collins
- Senior Staff Scientist
- Lu Yu
- Senior Staff Scientist
- Mercedes Pardo
- Senior Staff Scientist
- Ulrich Omasits
- Mass Spectrometry Data Analyst
- Enriqueta Banciella
- Advanced Research Assistant
- Ana Toribio
- Research Associate
Jyoti Choudhary
Team Leader
Dr Jyoti Choudhary is Head of Proteomic mass spectrometry at the Wellcome Trust Sanger Institute. She received her Ph.D. from the Imperial College, London, in the Biological Mass Spectrometry group of Professor Howard Morris where she worked on developing biochemical and analytical methods for elucidating primary structure of proteins. She continued her research as a post-doctoral fellow by establishing novel methods to purify and characterise membrane protein complexes by mass spectrometry. In 1997 she joined the Bioanalytical Sciences division in GlaxoWellcome. She was selected as a group leader to the CellMap project, founded to pursue the development of proteomics technologies and investigate their value in drug discovery. This unit was spun out of GlaxoSmithKline, and she became a founding member of Cellzome AG, in the UK. She led the analytical group and contributed in establishing the TAP-MS platform for characterising mammalian protein complexes. The team used this technology to systematically characterise protein complexes of a key human biological pathway, the APP processing pathway of Alzheimer's disease. This study is one of the largest functional proteomics studies of a disease pathway, and discoveries from this program underpin therapeutics projects underway in the company.
Research
Dr Choudhary's research group at the Sanger Institute is focused on developing and applying proteomic mass spectrometry methods for studying protein interactions and cell signaling.
Publications
-
Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores.
Journal of bacteriology 2009;191;17;5377-86
PUBMED: 19542279; PMC: 2725610; DOI: 10.1128/JB.00597-09
-
A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.
PLoS genetics 2009;5;7;e1000569
PUBMED: 19609351; PMC: 2704369; DOI: 10.1371/journal.pgen.1000569
-
Accurate and sensitive peptide identification with Mascot Percolator.
Journal of proteome research 2009;8;6;3176-81
PUBMED: 19338334; PMC: 2734080; DOI: 10.1021/pr800982s
-
Neurotransmitters drive combinatorial multistate postsynaptic density networks.
Science signaling 2009;2;68;ra19
PUBMED: 19401593; PMC: 3280897; DOI: 10.1126/scisignal.2000102
-
Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
Molecular systems biology 2009;5;269
PUBMED: 19455133; PMC: 2694677; DOI: 10.1038/msb.2009.27
-
Mapping multiprotein complexes by affinity purification and mass spectrometry.
Current opinion in biotechnology 2008;19;4;324-30
PUBMED: 18598764; DOI: 10.1016/j.copbio.2008.06.002
-
Evolutionary expansion and anatomical specialization of synapse proteome complexity.
Nature neuroscience 2008;11;7;799-806
PUBMED: 18536710; DOI: 10.1038/nn.2135
-
Phosphoproteomic analysis of the mouse brain cytosol reveals a predominance of protein phosphorylation in regions of intrinsic sequence disorder.
Molecular & cellular proteomics : MCP 2008;7;7;1331-48
PUBMED: 18388127; DOI: 10.1074/mcp.M700564-MCP200
-
Comparison of Mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted Mascot threshold.
Molecular & cellular proteomics : MCP 2008;7;5;962-70
PUBMED: 18216375; PMC: 2656932; DOI: 10.1074/mcp.M700293-MCP200
-
Molecular characterization of the Salmonella enterica serovar Typhi Vi-typing bacteriophage E1.
Journal of bacteriology 2008;190;7;2580-7
PUBMED: 18192390; PMC: 2293211; DOI: 10.1128/JB.01654-07
James Christopher Wright
Computational Biologist
I initially studied an undergraduate BSc(hons) in Biological and Computational Science at UMIST. During this degree I spent one year working within EST Informatics at AstraZeneca, where I focussed on the analysis and exploitation of microarray data. My final degree dissertation attempted to use machine learning methods to classify genomic sequences and intron or exons. Following my degree I studied an MSc in Physical Methods for Bioanalysis and Post Genomic Science at the University of Manchester. My dissertation for this degree attempted to use interpro domains to classify Phosphotases an map them to ontologies. In 2005 I took up a NERC funded PhD position at Liverpool University with Prof. Rob Beynon and Dr Simon Hubbard at the University of Manchester. My PhD tackled the problems face in cross species proteomics applying both lab based and insilico methods. My thesis investigated methods fro creating species independant search databases, using proteomics to assist genome annotation in Aspergillus Niger (Proteogenomics), and the use of species independant spectral profile libraries. In 2009 I joined Jyoti Choudhary's team here at the Wellcome Trust Sanger Institute
Research
I am developing the software tool Mascot Percolator expanding it capabilities and applying it in new ways, this has lead to much research examining different mass spectrometry fragmentation types. I am also developing innovative proteomics pipelines for the identification and localisation of peptide and protein modifications.
Publications
-
Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.
Genome research 2011;21;5;756-67
PUBMED: 21460061; PMC: 3083093; DOI: 10.1101/gr.114272.110
-
Cross species proteomics.
Methods in molecular biology (Clifton, N.J.) 2010;604;123-35
PUBMED: 20013368; DOI: 10.1007/978-1-60761-444-9_9
-
Recent developments in proteome informatics for mass spectrometry analysis.
Combinatorial chemistry & high throughput screening 2009;12;2;194-202
PUBMED: 19199887
-
Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.
BMC genomics 2009;10;61
PUBMED: 19193216; PMC: 2644712; DOI: 10.1186/1471-2164-10-61









Dr Jyoti Choudhary
